Collection of repeat proteins comprising repeat modules

ABSTRACT

The present invention relates to collections of repeat proteins comprising repeat modules which are derived from one or more repeat units of a family of naturally occurring repeat proteins, to collections of nucleic acid molecules encoding said repeat proteins, to methods for the constructions and application of such collections and to individual members of such collections.

[0001] The present invention relates to collections of repeat proteinscomprising repeat modules which are derived from one or more repeatunits of a family of naturally occurring repeat proteins, to collectionsof nucleic acid molecules encoding said repeat proteins, to methods forthe construction and application of such collections and to individualmembers of such collections.

[0002] A number of documents are cited throughout this specification.The disclosure content of these documents is herewith incorporated byreference.

[0003] Protein-protein interactions, or more generally, protein-ligandinteractions, play an important role in all organisms and theunderstanding of the key features of recognition and binding is onefocus of current biochemical research. Up to now, antibodies and any ofthe derivatives, which have been elaborated, are mainly used in thisfield of research. However, antibody technology is afflicted withwell-known disadvantages. For instance, antibodies can hardly be appliedintracellularly due to the reductive environment in the cytoplasm. Thus,there exists a need for high affinity binding molecules withcharacteristics that overcome the restriction of antibodies. Suchmolecules will most probably provide new solutions in medicine,biotechnology, and research, where intracellular binders will alsobecome increasingly important in genomics.

[0004] Various efforts to construct novel binding proteins have beenreported (reviewed in Nygren and Uhlen, 1997). The most promisingstrategy seemed to be a combination of limited library generation andscreening or selection for the desired properties. Usually, existingscaffolds were recruited to randomise some exposed amino acid residuesafter analysis of the crystal structure. However, despite progress interms of stability and expressibility, the affinities reported so farare considerably lower than the ones of antibodies (Ku and Schultz,1995). A constraint might be the limitation to targets for which thecrystal structure is known (Kirkham et al., 1999) or which arehomologous to the original target molecule, so that no universalscaffold for binding has been identified so far. To increase theapparent affinity of binders after screening, several approaches haveused multimerisation of single binders to take advantage of avidityeffects.

[0005] Thus, the technical problem underlying the present invention isto identify novel approaches for the construction of collections ofbinding proteins.

[0006] The solution to this technical problem is achieved by providingthe embodiments characterised in the claims. Accordingly, the presentinvention allows constructing collections of repeat proteins comprisingrepeat modules. The technical approach of the present invention, i.e. toderive said modules from the repeat units of naturally occurring repeatproteins, is neither provided nor suggested by the prior art.

[0007] Thus, the present invention relates to collections of nucleicacid molecules encoding collections of repeat proteins, each repeatprotein comprising a repeat domain, which comprises a set of consecutiverepeat modules, wherein each of said repeat modules is derived from oneor more repeat units of one family of naturally occurring repeatproteins, wherein said repeat units comprise framework residues andtarget interaction residues, wherein said repeat proteins differ in atleast one position corresponding to one of said target interactionresidues.

[0008] In the context of the present invention, the term “collection”refers to a population comprising at least two different entities ormembers. Preferably, such a collection comprises at least 10⁵, morepreferably more than 10⁷, and most preferably more that 10⁹ differentmembers. A “collection” may as well be referred to as a “library” or a“plurality”.

[0009] The term “nucleic acid molecule” refers to a polynucleotidemolecule, which is a ribonucleic acid (RNA) or deoxyribonucleic acid(DNA) molecule, either single stranded or double stranded. A nucleicacid molecule may either be present in isolated form, or be comprised inrecombinant nucleic acid molecules or vectors.

[0010] The term “repeat proteins” refers to a (poly)peptide/proteincomprising one or more repeat domains (FIG. 1). Preferably, each of saidrepeat proteins comprises up to four repeat domains. More preferably,each of said repeat proteins comprises up to two repeat domains.However, most preferably, each of the repeat proteins comprises onerepeat domain. Furthermore, said repeat protein may comprise additionalnon-repeat protein domains (FIG. 2a and 2 b), (poly)peptide tags and/or(poly)peptide linker sequences (FIG. 1). The term “(poly)peptide tag”refers to an amino acid sequence attached to a (poly)peptide/protein,where said amino acid sequence is usable for the purification,detection, or targeting of said (poly)peptide/protein, or where saidamino acid sequence improves the physio-chemical behavior of said(poly)peptide/protein, or where said amino acid sequence possesses aneffector function. Such (poly)peptide tags may be small polypeptidesequences, for example, His_(n) (Hochuli et al., 1988; Lindner et al.,1992), myc, FLAG (Hopp et al., 1988; Knappik and Plückthun, 1994), orStrep-tag (Schmidt and Skerra, 1993; Schmidt and Skerra, 1994; Schmidtet al., 1996. These (poly)peptide tags are all well known in the art andare fully available to the person skilled in the art. Additionalnon-repeat domains may be further moieties such as enzymes (for exampleenzymes like alkaline phosphatase), which allow the detection of saidrepeat proteins, or moieties which can be used for targeting (such asimmunoglobulins or fragments thereof) and/or as effector molecules. Theindividual (poly)peptide tags, moieties and/or domains of a repeatprotein may be connected to each other directly or via (poly)peptidelinkers. The term “(poly)peptide linker” refers to an amino acidsequence, which is able to link, for example two protein domains, a(poly)peptide tag and a protein domain or two sequence tags. Suchlinkers for example glycine-serine-linkers of variable lengths (e.g.Forrer and Jaussi, 1998), are known to the person skilled in therelevant art.

[0011] In the context of the present invention, the term “(poly)peptide”relates to a molecule consisting of one or more chains of multiple, i.e.two or more, amino acids linked via peptide bonds.

[0012] The term “protein” refers to a (poly)peptide, where at least partof the (poly)peptide has, or is able to, acquire a definedthree-dimensional arrangement by forming secondary, tertiary, orquaternary structures within and/or between its (poly)peptide chain(s).If a protein comprises two or more (poly)peptides, the individual(poly)peptide chains may be linked non-covalently or covalently, e.g. bya disulfide bond between two (poly)peptides. A part of a protein, whichindividually has, or is able to, acquire a defined three-dimensionalarrangement by forming secondary or tertiary structures is termed“protein domain”. Such protein domains are well known to thepractitioner skilled in the relevant art.

[0013] The term “family of naturally occurring repeat proteins” refersto a group of naturally occurring repeat proteins, where the members ofsaid group comprise similar repeat units. Protein families are wellknown to the person skilled in the art.

[0014] The term “repeat domain” refers to a protein domain comprisingtwo or more consecutive repeat units (modules) as structural units (FIG.1), wherein said structural units have the same fold, and stack tightlyto create a superhelical structure having a joint hydrophobic core (fora review see Kobe and Kajava, 2000). The term “structural unit” refersto a locally ordered part of a (poly)peptide, formed bythree-dimensional interactions between two or more segments of secondarystructure that are near one another along the (poly)peptide chain. Sucha structural unit comprises a structural motif. The term “structuralmotif” refers to a three-dimensional arrangement of secondary structureelements present in at least one structural unit. For example, thestructural motif repetitively present in LRR proteins consists of aβ-strand and an opposing antiparallel helical segment connected by aloop (FIG. 4a). Structural motifs are well known to the person skilledin the relevant art. Said structural units are alone not able to acquirea defined three-dimensional arrangement; however, their consecutivearrangement as repeat modules in a repeat domain leads to a mutualstabilization of neighbouring units resulting in said superhelicalstructure.

[0015] The term “repeat modules” refers to the repeated amino acidsequences of the repeat proteins encoded by the nucleic acid moleculesof the collection of the present invention, which are derived from therepeat units (FIG. 3) of naturally occurring proteins. Each repeatmodule comprised in a repeat domain is derived from one or more repeatunits of one family of naturally occurring repeat proteins.

[0016] Such “repeat modules” may comprise positions with amino acidresidues present in all copies of the repeat module (“fixed positions”)and positions with differing or “randomised” amino acid residues(“randomised positions”).

[0017] The term “set of repeat modules” refers to the total number ofrepeat modules present in a repeat domain. Such “set of repeat modules”present in a repeat domain comprises two or more consecutive repeatmodules, and may comprise just one type of repeat module in two or morecopies, or two or more different types of modules, each present in oneor more copies. The collection of repeat proteins according to thepresent invention may comprise repeat domains with identical number ofrepeat modules per corresponding repeat domain (i.e. one set with afixed number of repeat modules), or may comprise repeat domains, whichdiffer in the number of repeat modules per corresponding repeat domain(i.e. two or more sets with different numbers of repeat modules).

[0018] Preferably, the repeat modules comprised in a set are homologousrepeat modules. In the context of the present invention, the term“homologous repeat modules” refers to repeat modules, wherein more than70% of the framework residues of said repeat modules are homologous.Preferably, more than 80% of the framework residues of said repeatmodules are homologous. Most preferably, more than 90% of the frameworkresidues of said repeat modules are homologous. Computer programs todetermine the percentage of homology between polypeptides, such asFasta, Blast or Gap, are known to the person skilled in the relevantart.

[0019] Preferably, a repeat module of the present invention is derivedfrom one repeat unit. This may refer to a situation where a collectionof nucleic acid molecules, each molecule encoding a repeat domain of theinvention, is obtained by random mutagenesis of a nucleic acid moleculeencoding a naturally occurring repeat domain. Thus, said repeat domainof the present invention comprises a set of repeat modules, wherein eachof said modules is derived from the corresponding repeat unit of saidnaturally occurring repeat domain. Methods for random mutagenesis ofnucleic acid molecules such as error-prone PCR (Wilson and Keefe, 2000)or DNA shuffling (Volkov and Arnold, 2000) are well known to the personskilled in the relevant art. In another situation, a single naturallyoccurring repeat unit may be used to derive a repeat sequence motif ofthe present invention.

[0020] More preferably, a repeat module of the present invention isderived from one or more repeat units. This may refer to a situationwhere two or more homologous nucleic acid molecules, each encoding anaturally occurring repeat domain, are subjected to DNA recombination orrandom chimeragenesis (Volkov and Arnold, 2000). Thus, said repeatdomain of the present invention comprises a set of repeat modules,wherein each of said modules is derived from one or more correspondingrepeat units of said homologous naturally occurring repeat domains.Preferably, said homologous nucleic acid molecules possess a DNAsequences identity of at least 75%. More preferably said sequenceidentity is at least 85%.

[0021] Most preferably, a repeat module of the present invention isderived from two or more repeat units, where two or more homologousrepeat units are used to derive a repeat sequence motif of the presentinvention. Descriptions of such a derivation process are presented inthe examples.

[0022] The term “a repeat module derived from one or more repeat units”refers to

[0023] (i) a process comprising the step of generating a nucleic acidmolecule encoding a repeat module by random mutagenesis, e.g.error-prone PCR, of a nucleic acid molecule encoding a repeat unit; or

[0024] (ii) a process comprising the step of generating a nucleic acidmolecule encoding a repeat module by random chimeragenesis of two ormore homologous nucleic acid molecules each encoding a repeat unit; or

[0025] (iii) a process comprising the analysis of one or more repeatunits of naturally occurring repeat proteins and the deduction of arepeat module. This process may comprise the steps of:

[0026] (a) identifying naturally occurring repeat units;

[0027] (b) determining an initial repeat sequence motif by sequencealignments;

[0028] (c) refining the repeat sequence motif by sequence analysis andstructural analysis of said repeat units;

[0029] (d) constructing a repeat module according to the repeat sequencemotif of (c). or

[0030] (iv) a process comprising the process of (i), (ii) or (iii)followed by further evolution of the repeat module by random mutagenesisor random chimeragenesis.

[0031] The term “repeat unit” refers to amino acid sequences comprisingsequence motifs of one or more naturally occurring proteins, whereinsaid “repeat units” are found in multiple copies, and which exhibit adefined folding topology common to all said motifs determining the foldof the protein. Such repeat units comprise framework residues (FIG. 4d)and interaction residues (FIG. 4e). Examples of such repeat unitsinclude leucine-rich repeat units, ankyrin repeat units, armadillorepeat units, tetratricopeptide repeat units, HEAT repeat units, andleucine-rich variant repeat units (reviewed in Kobe & Deisenhofer, 1994;Groves & Barford, 1999; Marino et al., 2000; Kobe, 1996). Naturallyoccurring proteins containing two or more such repeat units are referredto as “naturally occurring repeat proteins”. The amino acid sequences ofthe individual repeat units of a repeat protein may have a significantnumber of mutations, substitutions, additions and/or deletions whencompared to each other, while still substantially retaining the generalpattern, or motif, of the repeat units.

[0032] Preferably, the repeat units used for the deduction of a repeatsequence motif are homologous repeat units, wherein the repeat unitscomprise the same structural motif and wherein more than 70% of theframework residues of said repeat units are homologous. Preferably, morethan 80% of the framework residues of said repeat units are homologous.Most preferably, more than 90% of the framework residues of said repeatunits are homologous.

[0033] The term “repeat sequence motif” refers to an amino acidsequence, which is deduced from one or more repeat units. Such repeatsequence motifs comprise framework residue positions and targetinteraction residue positions. Said framework residue positionscorrespond to the positions of framework residues of said repeat units.Said target interaction residue positions correspond to the positions oftarget interaction residues of said repeat units. Such repeat sequencemotifs comprise fixed positions and randomized positions. The term“fixed position” refers to an amino acid position in a repeat sequencemotif, wherein said position is set to a particular amino acid. Mostoften, such fixed positions correspond to the positions of frameworkresidues. The term “randomized position” refers to an amino acidposition in a repeat sequence motif, wherein two or more amino acids areallowed at said amino acid position. Most often, such randomizedpositions correspond to the positions of target target interactionresidues. However, some positions of framework residues may also berandomized. Amino acid sequence motifs are well known to thepractitioner in the relevant art.

[0034] The term “folding topology” refers to the tertiary structure ofsaid repeat units. The folding topology will be determined by stretchesof amino acids forming at least parts of α-helices or β-sheets, or aminoacid stretches forming linear polypeptides or loops, or any combinationof α-helices, β-sheets and/or linear polypeptides/loops.

[0035] The term “consecutive” refers to an arrangement, wherein saidmodules are arranged in tandem.

[0036] In repeat proteins, there are at least 2, usually about 2 to 6,more usually at least about 6, frequently 20 or more repeat units. Forthe most part, the repeat proteins are structural proteins and/oradhesive proteins, being present in prokaryotes and eukaryotes,including vertebrates and non-vertebrates. An analogy of ankyrinproteins to antibodies has been suggested (Jacobs and Harrison, 1998).

[0037] In most cases, said repeat units will exhibit a high degree ofsequence identity (same amino acid residues at corresponding positions)or sequence similarity (amino acid residues being different, but havingsimilar physicochemical properties), and some of the amino acid residuesmight be key residues being strongly conserved in the different repeatunits found in naturally occurring proteins.

[0038] However, a high degree of sequence variability by amino acidinsertions and/or deletions, and/or substitutions between the differentrepeat units found in naturally occurring proteins will be possible aslong as the common folding topology is maintained.

[0039] Methods for directly determining the folding topology of repeatproteins by physicochemical means such as X-ray crystallography, NMR orCD spectroscopy, are well known to the practitioner skilled in therelevant art. Methods for identifying and determining repeat units orrepeat sequence motifs, or for identifying families of related proteinscomprising such repeat units or motifs, such as homology searches (BLASTetc.) are well established in the field of bioinformatics, and are wellknown to the practitioner in such art. The step of refining an initialrepeat sequence motif may comprise an iterative process.

[0040] Crystal structures have been reported for ankyrin-type repeats(Bork, 1993; Huxford et al., 1998, see FIG. 2g and 2 h), theribonuclease inhibitor (RI) of the leucine-rich repeat (LRR) superfamily(Kobe and Deisenhofer, 1993, see FIG. 2c) and other LRR proteins (seeFIG. 2d to 2 f). Inspection of these structures revealed an elongatedshape in the case of the ankyrin repeats, or a horseshoe shape in thecase of the leucine-rich repeats giving rise to an extraordinarily largesurface.

[0041] The term “framework residues” relates to amino acid residues ofthe repeat units, or the corresponding amino acid residues of the repeatmodules, which contribute to the folding topology, i.e. which contributeto the fold of said repeat unit (or module) or which contribute to theinteraction with a neighboring unit (or module). Such contribution mightbe the interaction with other residues in the repeat unit (module) (4d),or the influence on the polypeptide backbone conformation as found inα-helices or β-sheets, or amino acid stretches forming linearpolypeptides or loops. The term “target interaction residues” refers toamino acid residues of the repeat units, or the corresponding amino acidresidues of the repeat modules, which contribute to the interaction withtarget substances. Such contribution might be the direct interactionwith the target substances (FIG. 4e), or the influence on other directlyinteracting residues, e.g. by stabilising the conformation of the(poly)peptide of said repeat unit (module) to allow or enhance theinteraction of said directly interacting residues with said target. Suchframework and target interaction residues may be identified by analysisof the structural data obtained by the physicochemical methods referredto above, or by comparison with known and related structural informationwell known to practitioners in structural biology and/or bioinformatics.

[0042] The term “interaction with said target substances” may be,without being limited to, binding to a target, involvement in aconformational change or a chemical reaction of said target, oractivation of said target.

[0043] A “target” may be an individual molecule such as a nucleic acidmolecule, a (poly)peptide protein, a carbohydrate, or any othernaturally occurring molecule, including any part of such individualmolecule, or complexes of two or more of such molecules. The target maybe a whole cell or a tissue sample, or it may be any non-naturallyoccurring molecule or moiety.

[0044] The term “differ in at least one position” refers to a collectionof repeat proteins, which have at least one position where more than oneamino acid may be found. Preferably, such positions are randomised. Theterm “randomised” refers to positions of the repeat modules, which arevariable within a collection and are occupied by more than one aminoacid residue in the collection. Preferably, the randomised positionsvary additionally between repeat modules within one repeat domain.Preferably, such positions may be fully randomised, i.e. being occupiedby the full set of naturally occurring, proteinogenic amino acidresidues. More preferably, such positions may be partially randomised,i.e. being occupied by a subset of the full set of naturally occurringamino acid residues. Subsets of amino acid residues may be sets of aminoacid residues with common physicochemical properties, such as sets ofhydrophobic, hydrophilic, acidic, basic, aromatic, or aliphatic aminoacids, subsets comprising all except for certain non-desired amino acidresidues, such as sets not comprising cysteines or prolines, or subsetscomprising all amino acid residues found at the corresponding positionin naturally occurring repeat proteins. The randomisation may be appliedto some, preferably to all of the target interaction residues. Methodsfor making “randomised” repeat proteins such as by usingoligonucleotide-directed mutagenesis of the nucleic acid sequencesencoding said repeat proteins (e.g. by using mixtures of mononucleotidesor trinucleotides (Virnekäs et al., 1994)), or by using error-prone PCRduring synthesis of said nucleic acid sequences, are well known to thepractitioner skilled in the art.

[0045] In a preferred embodiment, each of said repeat modules has anamino acid sequence, wherein at least 70% of the amino acid residuescorrespond either

[0046] (i) to consensus amino acid residues deduced from the amino acidresidues found at the corresponding positions of at least two naturallyoccurring repeat units; or

[0047] (ii) to the amino acid residues found at the correspondingpositions in a naturally occurring repeat unit.

[0048] A “consensus amino acid residue” may be found by aligning two ormore repeat units based on structural and/or sequence homologydetermined as described above, and by identifying one of the mostfrequent amino acid residue for each position in said units (an exampleis shown in FIG. 5a and 5 b). Said two or more repeat units may be takenfrom the repeat units comprised in a single repeat protein, or from twoor more repeat proteins. If two or more amino acid residues are foundwith a similar probability in said two ore more repeat units, theconsensus amino acid may be one of the most frequently found amino acidor a combination of said two or more amino acid residues.

[0049] Further preferred is a collection, wherein said set consists ofbetween two and about 30 repeat modules.

[0050] More preferably, said set consists of between 6 and about 15repeat modules.

[0051] In a yet further preferred embodiment of the present invention,said repeat modules are directly connected.

[0052] In the context of the present invention, the term “directlyconnected” refers to repeat modules, which are arranged as directrepeats in a repeat protein without an intervening amino acid sequence.

[0053] In a still further preferred embodiment, said repeat modules areconnected by a (poly)peptide linker.

[0054] Thus, the repeat modules may be linked indirectly via a(poly)peptide linker as intervening sequence separating the individualmodules. An “intervening sequence” may be any amino acid sequence, whichallows to connect the individual modules without interfering with thefolding topology or the stacking of the modules. Preferentially, saidintervening sequences are short (poly)peptide linkers of less than 10,and even more preferably, of less than 5 amino acid residues.

[0055] In a still further preferred embodiment of the collection of thepresent invention, each of said repeat proteins further comprises an N-and/or a C-terminal capping module (FIG. 1) having an amino acidsequence different from any one of said repeat modules.

[0056] The term “capping module” refers to a polypeptide fused to the N-or C-terminal repeat module of a repeat domain, wherein said cappingmodule forms tight tertiary interactions with said repeat module therebyproviding a cap that shields the hydrophobic core of said repeat moduleat the side not in contact with the consecutive repeat module from thesolvent (FIG. 1).

[0057] Said N- and/or C-terminal capping module may be, or may bederived from, a capping unit (FIG. 3) or other domain found in anaturally occurring repeat protein adjacent to a repeat unit. The term“capping unit” refers to a naturally occurring folded (poly)peptide,wherein said (poly)peptide defines a particular structural unit which isN- or C-terminally fused to a repeat unit, wherein said (poly)peptideforms tight tertiary interactions with said repeat unit therebyproviding a cap that shields the hydrophobic core of said repeat unit atone side from the solvent. Such capping units may have sequencesimilarities to said repeat sequence motif.

[0058] In a preferred embodiment, the present invention relates to acollection of nucleic acid molecules, wherein said repeat units areankyrin repeat units.

[0059] The characteristics of ankyrin repeat proteins have been reviewed(Sedgwick and Smerdon, 1999) and one minimal folding unit has beeninvestigated (Zhang and Peng, 2000). Ankyrin repeat proteins have beenstudied in some detail, and the data can be used to exemplify theconstruction of repeat proteins according to the present invention.

[0060] Ankyrin repeat proteins have been identified in 1987 throughsequence comparisons between four such proteins in Saccharomycescerevisiae, Drosophila melanogaster and Caenorhabditis elegans. Breedenand Nasmyth reported multiple copies of a repeat unit of approximately33 residues in the sequences of swi6p, cdc10p, notch and Iin-12 (Breedenand Nasmyth, 1987). The subsequent discovery of 24 copies of this repeatunit in the ankyrin protein led to the naming of this repeat unit as theankyrin repeat (Lux et al., 1990). Later, this repeat unit has beenidentified in several hundreds of proteins of different organisms andviruses (Bork, 1993; SMART database, Schultz et al., 2000). Theseproteins are located in the nucleus, the cytoplasm or the extracellularspace. This is consistent with the fact that the ankyrin repeat domainof these proteins is independent of disulfide bridges and thusindependent of the oxidation state of the environment. The number ofrepeat units per protein varies from two to more than twenty (SMARTdatabase, Schultz et al., 2000). A minimum number of repeat units seemsto be required to form a stable folded domain (Zhang and Peng, 2000). Onthe other hand, there is also some evidence for an upper limit of sixrepeat units being present in one folded domain (Michaely and Bennet,1993).

[0061] All so far determined tertiary structures of ankyrin repeat unitsshare a characteristic fold (Sedgwick and Smerdon, 1999) composed of aβ-hairpin followed by two antiparallel α-helices and ending with a loopconnecting the repeat unit with the next one (FIG. 4c). Domains built ofankyrin repeat units are formed by stacking the repeat units to anextended and curved structure. This is illustrated by the structure ofthe mouse GA-binding protein beta 1 subunit in FIG. 2h.

[0062] Proteins containing ankyrin repeat domains often containadditional domains (SMART database, Schultz et al., 2000). While thelatter domains have variable functions, the function of the ankyrinrepeat domain is most often the binding of other proteins, as severalexamples show (Batchelor et al., 1998; Gorina and Pavletich, 1996;Huxford et al., 1999; Jacobs and Harrisson, 1999; Jeffrey et al., 2000).When analysing the repeat units of these proteins, the targetinteraction residues are mainly found in the β-hairpin and the exposedpart of the first α-helix (FIG. 4c). These target interaction residuesare hence forming a largecontact surface on the ankyrin repeat domain.This contact surface is exposed on a framework built of stacked units ofα-helix 1, α-helix 2 and the loop (FIG. 4c). For an ankyrin repeatprotein consisting of five repeat units, this interaction surfacecontacting other proteins is approximately 1200 Å². Such a largeinteraction surface is advantageous to achieve high affinities to targetmolecules. The affinity of IkBa (which contains a domain of six ankyrinrepeat units) to the NF-kB heterodimer for example is K_(D)=3 nM (Maleket al., 1998), whereas the dissociation constant of human GA-bindingprotein beta 1 to its alpha unit is K_(D)=0.78 nM (Suzuki et al., 1998An advantage of the use of ankyrin repeat proteins according to thepresent invention over widely used antibodies is their potential to beexpressed in a recombinant fashion in large amounts as soluble,monomeric and stable molecules (example 2).

[0063] Further preferred is a collection, wherein each of said repeatmodules comprises the ankyrin repeat consensus sequence

[0064] DxxGxTPLHLAaxx±±±±±±±±±±GpxpaVpxLLpxGA±±±±±DVNAx,

[0065] wherein “x” denotes any amino acid, “±” denotes any amino acid ora deletion, “a” denotes an amino acid with an apolar side chain, and “p”denotes a residue with a polar sidechain. Most preferred is acollection, wherein one or more of the positions denoted “x” arerandomised.

[0066] Particularly preferred is a collection, wherein each of saidrepeat modules comprises the ankyrin repeat consensus sequence

[0067] DxxGxTPLHLAxxxGxxxVVxLLLxxGADVNAx, ps wherein “x” denotes anyamino acid.

[0068] Even more preferred is a diverse collection, wherein each of saidrepeat modules comprises the ankyrin repeat sequence motif

[0069] DxxGxTPLHLAxxxGxxxIVxVLLxxGADVNAx,

[0070] wherein “x” denotes any amino acid.

[0071] Yet more preferred is a diverse collection, wherein each of saidrepeat modules comprises the ankyrin repeat sequence motif

[0072] D11G1TPLHLAA11GHLEIVEVLLK2GADVNA1,

[0073] wherein 1 represents an amino acid residue selected from thegroup:

[0074] A, D, E, F, H, I, K, L, M, N, Q, R, S, T, V, W and Y;

[0075] wherein 2 represents an amino acid residue selected from thegroup:

[0076] H, N and Y.

[0077] In a further preferred embodiment, the present invention relatesto a collection of nucleic acid molecules, wherein said repeat units areleucine-rich repeats (LRR).

[0078] The characteristics and properties of the LRR repeat have beenreviewed (Kobe and Deisenhofer, 1994). LRR proteins have been studied insome detail, and the data can be used to exemplify the behaviour ofrepeat proteins.

[0079] LRR proteins have been identified by their highly conservedconsensus of leucine or other hydrophobic residues at positions 2, 5, 7,and 12 (FIG. 4b). However, the significance of this amino aciddistribution pattern was only understood, when the first structure of anLRR, the ribonuclease inhibitor protein was solved (FIG. 2c). Recently,further LRR crystal structures have been elucidated (FIG. 2d-2 f). Astructure of a typical ankyrin repeat domain protein is shown forcomparison (FIG. 2g). A single LRR is postulated to always correspond toa β-strand and an antiparallel α-helix (a unique α/β fold, FIG. 4a),surrounding a core made up from leucine or other aliphatic residues only(Kajava, 1998). The overall shape of ribonuclease inhibitor (RI), a LRRprotein, could be described as a horseshoe (FIG. 2c) formed by 15 tandemhomologous repeats of strictly alternating A-type (29 amino acids) andB-type (28 amino acids) LRR. The alternating nature of the protein wasalready recognised when the sequence was analysed (FIG. 5a, (Lee et al.,1988)).

[0080] Interestingly, mammalian RI are characterised by their extremeaffinity to their target proteins. For the binding of RNase A to humanRI a K_(i)=5.9×10⁻¹⁴ M (Kobe and Deisenhofer, 1996) was reported,whereas angiogenin was found to be inhibited with K_(i)=7×10⁻¹⁶ M by pigRI (Lee et al., 1989), thus becoming one of the strongest interactionknown between proteins. Even the best-binding antibodies featureaffinities only up 1.5×10⁻¹¹M (Yang et al., 1995). To better understandthe outstanding affinity, two RI were co-crystallised with their targetproteins. Subsequent analysis of the crystal structures showed that theinteractions are mainly electrostatic (Kobe and Deisenhofer, 1996) andthe involved amino acids were predominantly found emanating from theinner β-sheet and the loop connecting each unit to its α-helix (FIG. 4b,Kobe and Deisenhofer, 1995). Moreover, the width of the horseshoe-likefold can change slightly to accommodate the target protein (Kobe andDeisenhofer, 1994). The interface between target and inhibitor consistsof a “patch-work” of interactions and the tight association originatesfrom the large buried surface area (about 2550 A²) when the targetprotein is bound inside the horseshoe, rather than shape complementarity(Kobe and Deisenhofer, 1996).

[0081] When comparing the detailed binding of RNase A and angiogenin(two molecules with only 30% sequence identity) to RI, significantdifferences became apparent (Chen and Shapiro, 1997). Whereas largelythe same residues were involved on the side of RI, the residues of thetarget protein were not homologous or used different types of bonding(Papageorgiou et al., 1997). In other words, RI evolved in a way whichallowed it to bind and inhibit different target molecules by relying ona large number of contacts presented in correct geometrical orientation,rather than optimal complementarity of the residues. This is the basisfor a design of new binding molecules, which will have new bindingspecificities. The shape seems to be predestined for the recognition oflarge surfaces thereby allowing a much greater variety of random aminoacids to generate a library as compared to the relatively small“variable” domains of antibodies. However, the loops of antibodies seemto be superior if small haptens or deep clefts have to be recognised. Inaddition, not only the repeats themselves can be varied but also theirnumber depending on the target molecules.

[0082] Further preferred is a collection, wherein each of said modulescomprises the LRR consensus sequence

[0083] xLxxLxLxxN±xaxx±a±±±±a±±a±±x±±,

[0084] wherein “x” denotes any amino acid, “a” denotes an aliphaticamino acid, and “±” denotes any amino acid or a deletion.

[0085] The term “aliphatic amino acid” refers to an amino acid takenfrom the list of Ala, Gly, Ile, Leu and Val.

[0086] Particularly preferred is a collection, wherein at least one ofsaid modules comprises the LRR consensus sequence

[0087] xLExLxLxxCxLTxxxCxxLxxaLxxxx,

[0088] wherein “x” denotes any amino acid, and “a” denotes an aliphaticamino acid (A-type LRR).

[0089] Particularly preferred is furthermore a collection, wherein atleast one of said modules comprises the LRR consensus sequence

[0090] xLxELxLxxNxLGDxGaxxLxxxLxxPxx,

[0091] wherein “x” denotes any amino acid, and “a” denotes an aliphaticamino acid (B-type LRR).

[0092] Most preferred is a collection, wherein one or more of thepositions denoted “x” and/or “±” are randomised.

[0093] Further preferred is a collection, wherein the cysteine residueat position 10 in the A-type LRR consensus sequence is replaced by ahydrophilic amino acid residue, and wherein the cystein residue atposition 17 is replaced by a hydrophobic amino acid residue.

[0094] A hydrophilic amino acid residue may be taken from the list ofSer, Thr, Tyr, Gln, and Asn.

[0095] A hydrophobic amino acid residue may be taken from the list ofAla, Ile, Leu, Met, Phe, Trp, and Val.

[0096] Compared to single-chain Fv or conventional antibodies, severaladvantages can be enumerated. Whereas disulfide bridges are crucial forthe stability of most antibodies (Proba et al., 1997), no disulfidebonds are required in LRR proteins, which makes intracellularapplications possible.

[0097] Therefore, new binding molecules can be generated for applicationin a reducing environment. This could become an enormously powerful toolin elucidating the function of the numerous proteins identified by thegenome sequencing projects by direct inhibition in the cytosol. As formany applications in biotechnology large amounts of expressed andcorrectly folding proteins are required, a production in E. coli ispreferable but very difficult for antibodies which evolved in theoxidising extracellular environment. In contrast, folding or refoldingof RI variants are more efficient as they are naturally found in thecytosol (see Example 1).

[0098] In a further preferred embodiment of a collection according tothe present invention, one or more of the amino acid residues in anankyrin or LRR repeat module as described above are exchanged by anamino acid residue found at the corresponding position in acorresponding naturally occurring repeat unit.

[0099] Preferably, up to 30% of the amino acid residues are exchanged,more preferably, up to 20%, and most preferably, up to 10% of the aminoacid residues are exchanged.

[0100] Particularly preferred is a collection, wherein said set consistsof one type of repeat modules.

[0101] The term “type of repeat module” refers to the characteristics ofa module determined by the length of the module, the number andcomposition of its “fixed positions” as well as of its “randomisedpositions”. “Different types of modules” may differ in one or more ofsaid characteristics.

[0102] Further preferred is a collection, wherein said set consists oftwo different types of repeat modules.

[0103] In a still further preferred embodiment, the present inventionrelates to a collection, wherein said set comprises two different typesof consecutive repeat modules as pairs in said repeat proteins.

[0104] Most preferred is a collection, wherein said two different typesof modules are based on said A-type LRR and B-type LRR.

[0105] Further preferred is a collection, wherein the amino acidsequences of the repeat modules comprised in said set are identical foreach said type except for the randomised residues.

[0106] Yet further preferred is a collection, wherein the nucleic acidsequences encoding the copies of each said type are identical except forthe codons encoding amino acid residues at positions being randomised.

[0107] Particularly preferred is a collection, wherein the nucleic acidmolecules encoding said repeat proteins comprise identical nucleic acidsequences of at least 9 nucleotides between said repeat modules.

[0108] Said “identical nucleic acid sequences of at least 9 nucleotides”may be part of the end of only one repeat module, or be formed by theends of two adjacent repeat modules, or may be part of a (poly)peptidelinker connecting two repeat modules.

[0109] In a further preferred collection according to the presentinvention, the nucleic acid molecules encoding said repeat proteinscomprise identical nucleic acid sequences of at least 9 nucleotidesbetween said pairs.

[0110] Said “identical nucleic acid sequences of at least 9 nucleotides”may be part of the end of only one pair of repeat modules, or be formedby the ends of two adjacent pairs of repeat modules, or may be part of a(poly)peptide linker connecting two pairs of repeat modules.

[0111] Most preferable is a collection, wherein each of the nucleic acidsequences between said modules, or said pairs, comprises a restrictionenzyme recognition sequence. The term “restriction enzyme recognitionsequence” refers to a nucleic acid sequence being recognised and cleavedby a restriction endonuclease. Said restriction enzyme recognitionsequence may be divided symmetrically between the 3′ and 5′ ends (e.g. 3nucleotides of a 6 base pair recognition sequence on both ends), ornon-symmetrically (e.g. 2 nucleotides on one end, 4 on the correspondingend).

[0112] Particularly preferred is a collection, wherein each of thenucleic acid sequences between said modules, or said pairs, comprises anucleic acid sequence formed from cohesive ends created by twocompatible restriction enzymes.

[0113] The term “compatible restriction enzymes” refers to restrictionenzymes having different recognition sequences but forming compatiblecohesive ends when cleaving double stranded DNA. After re-ligation ofsticky-end double-stranded DNA fragments produced from two compatiblerestriction enzymes, the product DNA does no longer exhibit therecognition sequences of both restriction enzymes.

[0114] In a further most preferred embodiment of the collection of thepresent invention, said identical nucleic acid sequences allow aPCR-based assembly of the nucleic acid molecules encoding said repeatproteins.

[0115] In a most preferred embodiment of the collection according to thepresent invention, said repeat proteins comprise one or more pairs ofmodules based on said A-type LRR and B-type LRR, wherein each of saidpairs has the sequence

[0116] RLE1L1L112DLTEAG4KDLASVLRSNPSLREL3LS3NKLGDAGVRLLLQGLLDPGT,

[0117] wherein 1 represents an amino acid residue selected from thegroup:

[0118] D, E, N, Q, S, R, K, W and Y;

[0119] wherein 2 represents an amino acid residue selected from thegroup:

[0120] N, S and T;

[0121] wherein 3 represents an amino acid residue selected from thegroup:

[0122] G, S, D, N, H and T; and

[0123] wherein 4 represents an amino acid residue selected from thegroup:

[0124] L, V and M.

[0125] Most preferably, each of said pairs of modules is encoded by thenucleic acid molecule

[0126] CGC CTG GAG 111 CTG 111 CTG 111 111 222 GAC CTC ACC GAG GCC GGC444 MG GAC CTG GCC AGC GTG CTC CGC TCC MC CCG AGC CTG CGG GAG CTG 333CTG AGC 333 MC MG CTC GGC GAT GCA GGC GTG CGG CTG CTC TTG CAG GGG CTGCTG GAC CCC GGC ACG

[0127] wherein 111 represents a codon encoding an amino acid residueselected from the group:

[0128] D, E, N, Q, S, R, K, W and Y;

[0129] wherein 222 represents a codon encoding an amino acid residueselected from the group:

[0130] N, S and T;

[0131] wherein 333 represents a codon encoding an amino acid residueselected from the group:

[0132] G,S, D, N, H and T; and

[0133] wherein 444 represents a codon encoding an amino acid residueselected from the group:

[0134] L, V and M.

[0135] In another preferred embodiment one or more of the amino acidresidues in at least one pair of modules as listed above are exchangedby an amino acid residue found at the corresponding position in anaturally occurring LRR.

[0136] In yet another preferred embodiment, one or more of the aminoacid codons in at least one pair of modules as listed above areexchanged by a codon encoding an amino acid residue found at thecorresponding position in a naturally occurring LRR. Preferably, up to30% of the amino acid residues, or amino acid codons, respectively, areexchanged, more preferably, up to 20%, and most preferably, up to 10%are exchanged.

[0137] In yet another preferred embodiment, one or more of the aminoacid codons in at least one pair of modules as listed above areexchanged by a codon encoding an amino acid residue found at thecorresponding position in a naturally occurring LRR.

[0138] In a further preferred embodiment, the present invention relatesto a collection of recombinant nucleic acid molecules comprising acollection of nucleic acid molecules according to the present invention.

[0139] In the context of the present invention, the term “recombinantnucleic acid molecule” refers to a RNA or DNA molecule which comprises anucleic acid sequence encoding said repeat protein and further nucleicacid sequences, e.g. non-coding sequences.

[0140] In a still further preferred embodiment, the present inventionrelates to a collection of vectors comprising a collection of nucleicacid molecules according to the present invention, or a collection ofrecombinant nucleic acid molecules according to the present invention.

[0141] A vector according to the present invention may be a plasmid,phagemid, cosmid, or a virus- or bacteriophage-based vector, and may bea cloning or sequencing vector, or preferably an expression vector,which comprises all elements required for the expression of nucleic acidmolecules from said vector, either in prokaryotic or eukaryotcexpression systems. Vectors for cloning, sequencing and expressingnucleic acid molecules are well known to any one of ordinary skill inthe art. The vectors containing the nucleic acid molecules of theinvention can be transferred into the host cell by well-known methods,which vary depending on the type of cellular host. For example, calciumchloride transfection is commonly utilised for prokaryotic cells,whereas, e.g., calcium phosphate or DEAE-Dextran mediated transfectionor electroporation may be used for other cellular hosts; see Sambrook etal. (1989). Such vectors may comprise further genes such as marker geneswhich allow for the selection of said vector in a suitable host cell andunder suitable conditions. Preferably, the nucleic acid molecules of theinvention are operatively linked to expression control sequencesallowing expression in prokaryotic or eukaryotic cells. Expression ofsaid nucleic acid molecules comprises transcription of thepolynucleotide into a translatable mRNA. Regulatory elements ensuringexpression in eukaryotic cells, preferably mammalian cells, are wellknown to those skilled in the art. They usually comprise regulatorysequences ensuring initiation of transcription and, optionally, a poly-Asignal ensuring termination of transcription and stabilization of thetranscript, and/or an intron further enhancing expression of saidnucleic acid molecule. Additional regulatory elements may includetranscriptional as well as translational enhancers, and/ornaturally-associated or heterologous promoter regions. Possibleregulatory elements permitting expression in prokaryotic host cellscomprise, e.g., the pL, lac, trp or tac promoter in E. coli, andexamples for regulatory elements permitting expression in eukaryotichost cells are the AOX1 or GAL1 promoter in yeast or the CMV-, SV40-,RSV-promoter (Rous sarcoma virus), CMV-enhancer, SV40-enhancer or aglobin intron in mammalian and other animal cells. Beside elements whichare responsible for the initiation of transcription, such regulatoryelements may also comprise transcription termination signals, such asthe SV40-poly-A site or the tk-poly-A site, downstream of the nucleicacid molecule. Furthermore, depending on the expression system usedleader sequences capable of directing the (poly)peptide to a cellularcompartment or secreting it into the medium may be added to the codingsequence of the nucleic acid molecule of the invention and are wellknown in the art. The leader sequence(s) is (are) assembled inappropriate phase with translation, initiation and terminationsequences, and preferably, a leader sequence capable of directingsecretion of translated protein, or a portion thereof, into theperiplasmic space or extracellular medium. Optionally, the heterologoussequence can encode a fusion protein including a C- or N-terminalidentification peptide imparting desired characteristics, e.g.,stabilization or simplified purification of expressed recombinantproduct. In this context, suitable expression vectors are known in theart such as Okayama-Berg cDNA expression vector pcDV1 (Pharmacia),pCDM8, pRc/CMV, pcDNA1, pcDNA3 (In-vitrogene), pSPORT1 (GIBCO BRL) orpCI (Promega) or more preferably pTFT74 (Ge et al., 1995) or a member ofthe pQE series (Qiagen). Furthermore, the present invention relates tovectors, particularly plasmids, cosmids, viruses and bacteriophages usedconventionally in genetic engineering that comprise the polynucleotideof the invention. Preferably, said vector is an expression vector.Methods which are well known to those skilled in the art can be used toconstruct recombinant viral vectors; see, for example, the techniquesdescribed in Sambrook et al., Molecular Cloning A Laboratory Manual,Cold Spring Harbor Laboratory (1989) New York and Ausubel et al.,Current Protocols in Molecular Biology, Green Publishing Associates andWiley Interscience, New York (1989).

[0142] Futhermore, the invention relates to a collection of host cellscomprising a collection of nucleic acid molecules according to thepresent invention, a collection of recombinant nucleic acid moleculesaccording to the present invention, or a collection of vectors accordingto the present invention.

[0143] In the context of the present invention the term “host cell” maybe any of a number commonly used in the production of heterologousproteins, including but not limited to bacteria, such as Escherichiacoli (Ge et al., 1995), or Bacillus subtilis (Wu et al., 1993a), fungi,such as yeasts (Horwitz et al., 1988; Ridder et al., 1995) orfilamentous fungus (Nyyssönen et al., 1993), plant cells (Hiatt, 1990;Hiatt and Ma, 1993; Whitelam et al., 1994), insect cells (Potter et al.,1993; Ward et al., 1995), or mammalian cells (Trill et al., 1995).

[0144] In another embodiment, the present invention relates to acollection of repeat proteins encoded by a collection of nucleic acidmolecules according to the present invention, by a collection of vectorsaccording to the present invention, or produced by a collection of hostcells according to the present invention.

[0145] Furthermore, the present invention relates to a method for theconstruction of a collection of nucleic acid molecules according to thepresent invention, comprising the steps of

[0146] (a) identifying a repeat unit from a repeat protein family;

[0147] (b) identifying framework residues and target interactionresidues in said repeat unit;

[0148] (c) deducing at least one type of repeat module comprisingframework residues and randomised target interaction residues from atleast one member of said repeat protein family; and

[0149] (d) constructing nucleic acid molecules each encoding a repeatprotein comprising two or more copies of said at least one type ofrepeat module deduced in step (c).

[0150] The modes how this method is to be carried out are explainedabove in connection with the embodiment of the collection of nucleicacid molecules of the present invention. Descriptions of two such modesare illustrated in the example.

[0151] In a preferred embodiment of this method, said at least onerepeat module deduced in step (c) has an amino acid sequence, wherein atleast 70% of the amino acid residues correspond either

[0152] (i) to consensus amino acid residues deduced from the amino acidresidues found at the corresponding positions of at least two naturallyoccurring repeat units; or

[0153] (ii) to the amino acid residues found at the correspondingpositions in a naturally occurring repeat unit.

[0154] Further preferred is a method for the production of a collectionof poly)peptides/proteins according to the present invention, comprisingthe steps of

[0155] (a) providing a collection of host cells according to the presentinvention; and

[0156] (b) expressing the collection of nucleic acid molecules comprisedin said host cells.

[0157] Particularly preferred is a method for obtaining a repeat proteinhaving a predetermined property, comprising the steps of

[0158] (a) providing a collection of repeat proteins according to thepresent invention; and

[0159] (b) screening said collection and/or selecting from saidcollection to obtain at least one repeat protein having saidpredetermined property.

[0160] The diverse collection of repeat proteins may be provided byseveral methods in accordance with the screening and/or selection systembeing used, and may comprise the use of methods such as display on thesurface of bacteriophages (WO 90/02809; Smith, 1985; Kay et al., 1996;Dunn, 1996) or bacterial cells (WO 93/10214), ribosomal display (WO91/05058; WO 98/48008; Hanes et al., 1998), display on plasmids (WO93/08278) or by using covalent RNA-repeat protein hybrid constructs (WO00/32823), intracellular expression and selection/screening such as byprotein complementation assay (WO 98/341120; Pelletier et al., 1998). Inall these methods, the repeat proteins are provided by expression of acorresponding collection of nucleic acid molecules and subsequentscreening of the repeat proteins followed by identification of one ormore repeat proteins having the desired property via the geneticinformation connected to the repeat proteins.

[0161] In the context of the present invention the term “predeterminedproperty” refers to a property, which one of the repeat proteins out ofthe collection of repeat proteins should have, and which forms the basisfor screening and/or selecting the collection. Such properties compriseproperties such as binding to a target, blocking of a target, activationof a target-mediated reaction, enzymatic activity, and furtherproperties, which are known to one of ordinary skill. Depending on thetype of desired property, one of ordinary skill will be able to identifyformat and necessary steps for performing screening and/or selection.

[0162] Most preferably, the present invention relates to a method,wherein said predetermined property is binding to a target.

[0163] In another embodiment, the invention relates to a repeat proteinfrom a collection according to the present invention.

[0164] Preferably said repeat protein has been obtained by theabove-described method and has one of the predetermined properties.

[0165] Furthermore, the present invention relates to a nucleic acidmolecule encoding the repeat protein according to the present invention.

[0166] In yet another embodiment, the present invention relates to avector containing the nucleic acid molecule according to the presentinvention.

[0167] The present invention relates also to pharmaceutical compositionscomprising a repeat protein from a collection of the present inventionor a nucleic acid molecule encoding said repeat protein, and optionallya pharmaceutically acceptable carrier and/or diluent.

[0168] Examples of suitable pharmaceutical carriers are well known inthe art and include phosphate buffered saline solutions, water,emulsions, such as oil/water emulsions, various types of wetting agents,sterile solutions etc. Compositions comprising such carriers can beformulated by well known conventional methods. These pharmaceuticalcompositions can be administered to the subject at a suitable dose.Administration of the suitable compositions may be effected by differentways, e.g., by intravenous, intraperitoneal, subcutaneous,intramuscular, topical, intradermal, intranasal or intrabronchialadministration. The dosage regimen will be determined by the attendingphysician and clinical factors. As is well known in the medical arts,dosages for any one patient depends upon many factors, including thepatient's size, body surface area, age, the particular compound to beadministered, sex, time and route of administration, general health, andother drugs being administered concurrently. A typical dose can be, forexample, in the range of 0.001 to 1000 μg (or of nucleic acid forexpression or for inhibition of expression in this range); however,doses below or above this exemplary range are envisioned, especiallyconsidering the aforementioned factors. Generally, the regimen as aregular administration of the pharmaceutical composition should be inthe range of 1 μg to 10 mg units per day. If the regimen is a continuousinfusion, it should also be in the range of 1 μg to 10 mg units perkilogram of body weight per minute, respectively. Progress can bemonitored by periodic assessment. Dosages will vary but a preferreddosage for intravenous administration of DNA is from approximately 10⁶to 10¹² copies of the DNA molecule. The compositions of the inventionmay be administered locally or systemically. Administration willgenerally be parenterally, e.g., intravenously; DNA may also beadministered directly to the target site, e.g., by biolistic delivery toan internal or external target site or by catheter to a site in anartery. Preparations for parenteral administration include sterileaqueous or non-aqueous solutions, suspensions, and emulsions. Examplesof non-aqueous solvents are propylene glycol, polyethylene glycol,vegetable oils such as olive oil, and injectable organic esters such asethyl oleate. Aqueous carriers include water, alcoholic/aqueoussolutions, emulsions or suspensions, including saline and bufferedmedia. Parenteral vehicles include sodium chloride solution, Ringer'sdextrose, dextrose and sodium chloride, lactated Ringer's , or fixedoils. Intravenous vehicles include fluid and nutrient replenishers,electrolyte replenishers (such as those based on Ringer's dextrose), andthe like. Preservatives and other additives may also be present such as,for example, antimicrobials, anti-oxidants, chelating agents, and inertgases and the like. Furthermore, the pharmaceutical composition of theinvention may comprise further agents such as interleukins orinterferons depending on the intended use of the pharmaceuticalcomposition.

[0169] The repeat proteins comprised in the pharmaceutical compositionsof the present invention can comprise a further domain, said domainbeing linked by covalent or non-covalent bonds. The linkage can be basedon genetic fusion according to the methods known in the art anddescribed above or can be performed by, e.g., chemical cross-linking asdescribed in, e.g., WO 94/04686. The additional domain present in thefusion protein comprising the peptide, polypeptide or antibody employedin accordance with the invention may preferably be linked by a flexiblelinker, advantageously a polypeptide linker, wherein said polypeptidelinker comprises plural, hydrophilic, peptide-bonded amino acids of alength sufficient to span the distance between the C-terminal end ofsaid further domain and the N-terminal end of the repeat protein or viceversa. The above described fusion protein may further comprise acleavable linker or cleavage site for proteinases. Furthermore, saidfurther domain may be of a predefined specificity or function. In thiscontext, it is understood that the repeat proteins present in thepharmaceutical composition according to the invention may be furthermodified by conventional methods known in the art. This allows for theconstruction of fusion proteins comprising the repeat protein of theinvention and other functional amino acid sequences, e.g., nuclearlocalization signals, transactivating domains, DNA-binding domains,hormone-binding domains, protein tags (GST, GFP, h-myc peptide, FLAG, HApeptide) which may be derived from heterologous proteins. Thus,administration of the composition of the invention can utilize unlabeledas well as labeled (poly)peptides or antibodies.

[0170] Further preferred is a nucleic acid molecule encoding a pair ofrepeat modules for the construction of a collection according to thepresent invention, wherein said nucleic acid molecule is: CGC CTG GAG111 CTG 111 CTG 111 111 222 GAC CTC ACC GAG GCC GGC 444 AAG GAC CTG GCCAGC GTG CTC CGC TCC AAC CCG AGC CTG CGG GAG CTG 333 CTG AGC 333 AAC AAGCTC GGC GAT GCA GGC GTG CGG CTG CTC TTG CAG GGG CTG CTG GAC CCC GGC ACG,

[0171] wherein 111 represents a codon encoding an amino acid residueselected from the group:

[0172] D, E, N, Q, S, R, K, W and Y;

[0173] wherein 222 represents a codon encoding an amino acid residueselected from the group:

[0174] N, S and T;

[0175] wherein 333 represents a codon encoding an amino acid residueselected from the group:

[0176] G,S, D, N, H and T; and

[0177] wherein 444 represents a codon encoding an amino acid residueselected from the group:

[0178] L, V and M.

[0179] These and other embodiments are disclosed and encompassed by thedescription and examples of the present invention. Further literatureconcerning any one of the methods, uses and compounds to be employed inaccordance with the present invention may be retrieved from publiclibraries, using for example electronic devices. For example database“PubMed” (Sequeira et al., 2001) may be utilised which is available onthe Internet.

[0180] An overview of patent information in biotechnology and a surveyor relevant sources of patent information useful for retrospectivesearching and for current awareness is given in Berks, (1994).

FIGURES

[0181]FIG. 1. Schematic representation of the terms “Repeat Protein”,“Repeat Domains”, “Non-repeat Domain”, “Repeat Module”, “CappingModules”, and “Linker”.

[0182]FIG. 2a. Examples of leucine-rich repeat proteins featuring only arepeat domain (1A4Y) or both a repeat domain and a non-repeat domain(1D0B).

[0183]FIG. 2b. Examples of ankyrin repeat proteins featuring only arepeat domain (1AWC) or both a repeat domain and a non-repeat domain(1DCQ).

[0184]FIG. 2c. Crystal Structure of the Pig Liver Ribonuclease Inhibitor(Kobe and Deisenhofer, 1993).

[0185]FIG. 2d. Crystal Structure of the Yeast ma1p GTPase-activatingProtein (Hillig et al., 1999).

[0186]FIG. 2e. Crystal Structure of the Listeria InIB Protein (Marino etal., 1999).

[0187]FIG. 2f. Crystal Structure of the Human Spliceosomal Protein U2A'(Price et al., 1998).

[0188]FIG. 2g. Crystal Structure of the Human Transcription FactorInhibitor IκBα (Huxford et al., 1998).

[0189]FIG. 2h. X-ray structure of the ankyrin repeat domain of the mouseGA-binding protein beta 1 subunit [pdb entry 1AWC (Batchelor et al.,1998)]. The N- and C-termini of the domain are labeled. This image hasbeen created using MOLMOL (Koradi et al., 1996).

[0190]FIG. 3. Examples of naturally occurring repeat units and cappingunits. A leucine-rich repeat protein (1A4Y) and an ankyrin repeatprotein (1AWC) are shown.

[0191]FIG. 4a. β/α-Fold of the LRR unit from Pig Ribonuclease Inhibitor(Residue 423 to 450).

[0192]FIG. 4b. Leucines and Positions of Amino Acids Emanating from theβ-strand of a LRR unit from Pig Ribonuclease Inhibitor (Residue 86 to112).

[0193]FIG. 4c. Structural description of an ankyrin repeat unit. A:Sideview. B: Topview. Interacting residues are depicted as “balls andsticks”. These pictures were made using the third repeat of theGA-binding protein (pdb entry 1AWC; Batchelor et al., 1998) displayedwith MOLMOL (Koradi et al., 1996).

[0194]FIG. 4d. A subset of the framework residues of a LRR unit is shownas “ball and sticks”. The numbering refers to the positions within a LRRunit.

[0195]FIG. 4e. A subset of the target interaction residues of a LRR unitis shown as “ball and sticks”. The numbering refers to the positionswithin a LRR unit.

[0196]FIG. 4f. A model of a LRR repeat module pair is shown. Thenumbering refers to the positions within the derived LRR repeat motifpair.

[0197]FIG. 5a. Internal Amino Acid Alignment of Human PlacentalRibonuclease Inhibitor

[0198]FIG. 5b. Consensus Defined on the Basis of all RibonucleaseInhibitor Sequences

[0199]FIG. 5c. Statistical analysis of the most frequent amino acids atone positions in the A-type repeat units of mammalian RI.

[0200]FIG. 5d. Statistical analysis of the most frequent amino acids atone positions in the B-type repeat units of mammalian RI.

[0201]FIG. 6. Restriction Enzyme Recognition Sites and Encoded AminoAcids. The DNA recognized by BssHII codes for alanine and arginine (Aand R) in the first reading frame. Accordingly, MluI codes for threonineand arginine (T and R) in the first reading frame. Combination of DNAmolecules cut with BssHII and MluI give a new combined site notrecognized by either restriction enzyme and coding for alanine andarginine (A and R).

[0202]FIG. 7a to 7 c. Cloning of the library of repeat modules.

[0203]FIG. 8. DNA sequence and translated amino acids of theNcoI-HindIII insert in plasmid pTFT_N1 CL. The abbreviation pTFT refersto all plasmids derived from pTFT74 (Ge et al., 1995). The abbreviationN1CL refers to an insert containing an N-terminal module, 1 repeatmodule, a C-terminal module, and a linker sequence.

[0204]FIG. 9a to 9 c. Diagrams of plasmids pTFT_N, pQE_N1C, andpQE-pD_N2C. The nomenclature is as described in the caption of FIG. 8.The name of plasmids derived from pQE30 (Qiagen) always starts with pQE.The abbreviation pD refers to lambda phage protein D (Forrer and Jaussi,1998).

[0205]FIG. 10. DNA sequence of the NcoI-HindIII insert of plasmidpQE_N4C clone D17.

[0206]FIG. 11a. High-level expression of randomly chosen members of thepD_N2C library (A2, A10, . . . ). XL1-Blue cells containing one of thelibrary expression plasmid pQE-pD_N2C were grown at 37° C. to an OD₆₀₀=1 and induced for 1 h with 1 mM IPTG. The collected cells wereresuspended in TBS₅₀₀, sonicated, and centrifuged. Samples correspondingto the supernatant (S) or pellet (P) of 40 microliters of cell culturewere separated on a 15% SDS-PAGE and stained with Coomassie Blue. Theclones are designated A2, A10, and so on. Ap1 and Ap2 are pools of 10individual clones; Y: truncated pD_N2C (26 kDa), X: pD_N2C (33 kDa).

[0207]FIG. 11b. High-level expression of randomly chosen members of theN2C (C1, C2, . . . ) and pD_N4C (B9, B21) libraries as described in FIG.11a; *: N2C (22 kDa), #: pD_N4C (45 kDa).

[0208]FIG. 11c. High-level expression of randomly chosen members of theN4C (D11, D15, . . . ) library as described in FIG. 11a; Z: N4C (34kDa).

[0209]FIG. 11d. High-level expression of randomly chosen members of theN4C (D11, D15, . . . ) library as described in FIG. 11a but growth at25° C.; Z: N4C (34 kDa).

[0210]FIG. 12a. Western blot analysis of high-level expression ofmembers of the pD_N2C library (A2, A10, A15) after expression at either25° C. or 30° C. Protein was prepared as for FIG. 11a, Antibodyanti-RGS-His was used in 1:5000 dilution following the manufacturer'sprotocol (Qiagen); Y: truncated pD_N2C (26 kDa), X: pD_N2C (33 kDa).

[0211]FIG. 12b. Western blot analysis of high-level expression ofrandomly chosen members of the pD_N4C library (B9, B21, BP which is apool) after expression at either 37° C. or 25° C.; #: pD_N4C (45 kDa).

[0212]FIG. 12c. Western blot analysis of high-level expression of somemembers of the N2C library (C1, C3, C7) after expression at either 37°C. or 25° C. Protein was prepared as for FIG. 11a, Antibody anti-Flag M2was used in 1:1000 dilution following the manufacturer's protocol(Sigma); *: N2C (22 kDa).

[0213]FIG. 12d. Westem blot analysis of high-level expression of somemembers of the N4C library (D17, D19, D22) after expression at either37° C. or 25° C.; Z: N4C (34 kDa). FIG. 13. His-tag purification undernative conditions of a randomly chosen leucine-rich repeat proteinaccording to the present invention. Lane M shows the molecular sizemarker (in kDa), lane FT shows the unbound fraction, and lanes 0 to 6show different elution fractions. The arrow indicates the position ofthe expected protein.

[0214]FIG. 14. His-tag purification under denaturing conditionsincluding refolding of the repeat proteins in the purification column.Lanes 1 to 6 show the unbound fractions of six leucine-rich repeatproteins according to the present invention. Lanes 7 to 12 show the peakelution fractions of the same six proteins. The arrow indicates theposition of the expected proteins.

[0215]FIG. 15. Circular dichroism spectrometry of a randomly chosenleucine-rich repeat protein according to the present invention.

[0216]FIG. 16. Size exclusion chromatography of a randomly chosenleucine-rich repeat protein according to the present invention. Thesample was analysed on a Superose 12 column.

[0217]FIG. 17. DNA recognition sequences of the restriction enzymes usedfor the cloning of ankyrin repeat proteins according to the presentinvention. Type II restriction enzymes cleave DNA within a palindromicrecognition site, while type IIs restriction enzymes cut outside anon-palindromic recognition site. Two type IIs restriction enzymes (BpiIand BsaI) were used to ligate ankyrin repeat modules with each other ina directed manner by virtue of their compatible overhangs (see FIG. 18,Table 2 and Table 3), generating seamless connections of a ankyrinrepeat module to the next one. These type IIs restriction enzymes werealso used to link the N- and the C-terminal ankyrin capping modules withthe ankyrin repeat modules separating them. BamHI and HindIII were usedfor the cloning of the ankyrin repeat proteins constructed according tothe present invention (containing a N-terminal ankyrin capping module,two or more ankyrin repeat modules and a C-terminal ankyrin cappingmodule) into plasmid pQE30 (QIAgen, Germany). The pattern of restrictionis indicated for each enzyme by solid lines.

[0218]FIG. 18. Schematic view of the stepwise elongation of theN-terminal ankyrin capping module with ankyrin repeat modules on DNAlevel. The N-terminal ankyrin capping module is elongated by ankyrinrepeat modules to the required length, followed and ended by theaddition of the C-terminal ankyrin capping module.

[0219]FIG. 9. Consensus “A” (obtained after SMART analysis), theconsensus used for the BLAST search (circularly permutated consensus “A”where missing residues have been taken from a consensus of ankyrinrepeat units of ankyrin repeat proteins with known three dimensionalstructure), consensus “B” (derived after BLAST search) as well asconsensus “C” (finally obtained considering various parameters mentionedin EXAMPLE 2) are listed to illustrate the stepwise definition of theankyrin repeat unit consensus. For consensus “A” and “B”, residuesreaching 20% frequency at a given position are displayed. In consensus“C”, several amino acids are displayed at positions to which the latteramino acids fitted equally well.

[0220]FIG. 20. The sequence of the ankyrin repeat motif (i.e. the basisof all ankyrin repeat modules of EXAMPLE 2) and the respective positionnumbers of the amino acids are displayed. In addition, the expectedsecondary structures (a meaning α-helix, β meaning β-sheet) areindicated. The six positions denoted “x” were defined as targetinteraction residues which were allowed to be any of the amino acids A,D, E, F, H, I, K, L, M, N, Q, R, S, T, V, W and Y. The remainingpositions were defined to be framework residues defined by consensus “C”(cf. FIG. 19). At position 26, any out of the three amino acidshistidine, tyrosine or asparagine were allowed. For cloning reasons theankyrin repeat motif is based on a circularly permutated consensus “C”(cf. FIG. 19). To match the consensus numbering scheme used in FIG. 19and used by Sedgwick and Smerdon (1999), the numbers were circularlypermutated in parallel with the consensus sequence.

[0221]FIG. 21. Alignment of the randomly chosen clone “E3-5” constructedaccording to the present invention. The amino acid sequence of E3-5, aprotein having 3 ankyrin repeat modules (FIG. 20) between the N- and theC-terminal ankyrin capping modules, is aligned to mouse GA-bindingprotein beta 1. The latter is the protein showing highest homology toE3-5 among known ankyrin repeat proteins. The sequences were alignedusing the command “gap” of GCG (Womble, D. D., 2000) with default valuesand the sequence comparison matrix Blosum62. Over all, the two moleculesshowed 67% residue identity and 71% residue homology. Positionscorresponding to randomised positions in the repeat motif (cf. FIG. 20)are marked with an asterisk above. The N-terminal and C-terminal ankyrincapping modules are overlined, the three ankyrin repeat modulesunderlined.

[0222] FIG 22. High-level expression of differently sized ankyrin repeatproteins generated according to the present invention [BamHI/HindIIIcloned into plasmid pQE30 (QIAgen); expressed in E. coli XL1-Blue(Stratagene)]. Of each library of N2C, N3C and N4C, two randomly chosenclones were tested. The abbreviation N2C refers to an N-terminal ankyrincapping module, two ankyrin repeat modules and a C-terminal ankyrincapping module being connected using the cloning strategy stated in FIG.17 and FIG. 18. N3C and N4C are named accordingly to their content ofthree or four ankyrin repeat modules between their N- and C-terminalankyrin capping modules. Expression was performed as described inEXAMPLE 2.

[0223] Samples corresponding to 30 μl of culture were taken at varioustimepoints and separated on 15% SDS-PAGE (Coomassie stained). Lane 1:Molecular marker (size indicated in kDa); Lane 2-7: two N2C, two N3C andtwo N4C clones just before induction; Lane 8-13: same as lane 2-7 butafter 2.5 hours induction; Lane 14-19: same as lane 2-7 but after 4hours induction.

[0224]FIG. 23. His-tag purification of a randomly chosen ankyrin repeatprotein generated according to the present invention. A 15% SDS-PAGEshowing different fractions of the purification procedure is depicted.E3-5, an N3C clone, was expressed and purified as described in EXAMPLE2. Lane 1 represents 0.6 μl of the collected cell lysate flow throughwhich was not bound by the Ni-NTA columns. Lane 2 represents 0.6 μl offirst 800 μl column washing fraction. Lane 3 represent 0.6 μl of thelast 800 □l washing fraction. Lanes 4,5, 6, 7, 8 and 9 represent 0.6 μlof the subsequent elution steps (800 μl each) of the ankyrin repeatprotein. Lane 10 shows the molecular marker (sizes in kDa).

[0225]FIG. 24. Size exclusion chromatography of a randomly chosenankyrin repeat protein generated according to the present invention(E3-5, a N3C molecule; cf. FIG. 22). The sample was analysed on aSuperdex 75 column (Amersham Pharmacia Biotech, USA) using a PharmaciaSMART system at a flow rate of 60 μl/min and TBS 150 (50 mM Tris-HCl, pH7.5; 150 mM NaCl) as running buffer. Standards were □-amylase and thephage proteins SHP of phage 21 and pD of □. The apparent masses of thestandards are indicated in the figure. The apparent mass of 200 kDa for□-amylase is not indicated, as the protein eluted in the void volume.

[0226]FIG. 25. Circular dichroism spectra of a randomly chosen ankyrinrepeat protein generated according to the present invention (E3-5, a N3Cmolecule). The spectra were recorded either in 10 mM sodium phosphatebuffer pH 6.5 (native) or 20 mM sodium phosphate buffer pH 6.5 and 6 MGuanidinium hydrochloride (denatured) using a Jasco J-715 instrument[Jasco, Japan; 10 nm/s, 8 sec response, 0.2 nm data pitch, 2 nm bandwidth, 195-250 nm (native) or 212-250 nm (denatured), threeaccumulations, measurements in triplicates, 1 mm cuvette]. The CD signalwas converted to mean residue elipticity using the concentration of thesample determined spectrophotometrically at 280 nm under denaturingconditions. E3-5 shows an alpha-helical spectrum under native conditionswith minima at 208 nm and 222 nm. The secondary structure is lost in 6 MGuanidinium hydrochloride.

[0227]FIG. 26. Denaturation behaviour of randomly chosen ankyrin repeatproteins generated according to the present invention (cf. FIG. 22). TheCD values at 220 nm are shown over guanidinium hydrochlorideconcentration for the different proteins. The different proteins wereincubated with different concentrations of guanidinium hydrochloride in20 mM NaPO4 pH6.5, 100 mM NaCl, overnight at room temperature. Thecircular dichroism signal at 220 nm was measured for each sample intriplicates (conditions as indicated in EXAMPLE 2). The secondarystructure is lost only at high concentrations of denaturing agentindicating a high stability of the tested proteins.

[0228]FIG. 27. Crystals of a randomly chosen ankyrin repeat proteingenerated according to the present invention (E3-5, a N3C library memberof FIG. 22). The crystal was grown in five days at 20° C. in 20% PEG6000, 100 mM MES/NaOH pH 6.0, hanging droplet (2 μl protein and 2 μlbuffer mixed; 500 μl buffer reservoir) from a solution of 9 mg Proteinper ml in TBS 50 (50 mM TrisHCl, pH 8.0, 50 mM NaCl).

[0229] The examples illustrate the invention.

EXAMPLES

[0230] Unless stated otherwise in the examples, all recombinant DNAtechniques are performed according to described protocols (Sambrook etal., 1989 or Ausubel et al., 1994). Databases used were

[0231] Genbank

[0232] National Center for Biotechnology Information, National Libraryof Medicine, Bethesda, USA

[0233] Swiss-Prot

[0234] Swiss Institute of Bioinformatics, Geneva, Switzerland

[0235] Protein Data Base

[0236] Center for Molecular Biophysics and Biophysical Chemistry atRutgers, New Jersey, USA

[0237] Simple Modular Architecture Research Tool (SMART)

[0238] EMBL, Heidelberg, Germany

1. Collection of Repeat Proteins Comprising Repeat Modules Derived fromRepeat Units of Mammalian Ribonuclease Inhibitors

[0239] This example describes the construction of a collection ofleucine-rich repeat proteins derived from mammalian ribonucleaseinhibitors (RI). This scaffold was chosen, since extraordinarily tightinteractions in the femtomolar range have been reported for the bindingof angiogenin by RI (Lee et al., 1989) and RNase A by RI (Kobe andDeisenhofer, 1996).

[0240] As the RI amino acid sequence showed a characteristic pattern oftwo alternating, different but homologous repeat units, termed A- andB-type LRR repeat unit (Kobe and Deisenhofer, 1994), two accordingrepeat motifs were derived and used to build a repeat domain. Theassembly of a LRR repeat motif of type A with a LRR repeat motif of typeB is henceforth referred to as RI repeat motif pair. A model of a repeatmodule pair comprising a RI repeat motif pair is shown (FIG. 4f). Thisexample demonstrates the use of more than one repeat motif to build arepeat domain, which is in contrast to example 2 where only one repeatmotif is used.

[0241] 1) Derivinq Preliminary Repeat Sequence Motifs of Mammalian RI

[0242] The protein sequences of human RI (accession number P13489, Leeet al., 1988) and pig RI (P10775, Hofsteenge et al., 1988) were used tosearch for homologous sequences. The complete protein sequence of therat RI (P29315, Kawanomoto et al., 1992) and mouse RI protein were found(AAK68859, unpublished).

[0243] The repeat units of the obtained RI protein sequences werealigned using “FastA” implemented in the GCG® Wisconsin Package™(Accelrys, USA). The protein sequence of human RI is shown (FIG. 5a) andthe LRR pattern characterised by leucines or other aliphatic residues atpositions 2, 5, 7, 12, 20, and 24 (Kobe and Deisenhofer, 1994) ishighlighted. The most abundant amino acid for each position wascalculated for the human, mouse, pig, and rat RI sequences (FIG. 5c and5 d). A first RI repeat motif pair was defined by amino acids occurringin 50% (cf. FIG. 5c and 5 d) or more of the cases at a given positionA-type LRR consensus 1 3 5 7 9 11 13 15 17 19 21 23 25 27-LE-L-L--C-LT-A-C--L-SVL---- B-type LRR consensus 1 3 5 7 9 11 13 15 1719 21 23 25 27 29 SL-EL-LS-N-LGD-G---LC-GL--P-C

[0244] For a threshold of 40% or more identical amino acids at a givenposition the RI repeat motif pair was defined by the following aminoacid sequence A-type LRR consensus 1 3 5 7 9 11 13 15 17 19 21 23 25 27+LE-L-L--C-LTAA-C-DL-SVLRAN- where + is R or K B-type LRR consensus 1..35 7 9 11 13 15 17 19 21 23 25 27 29 SL-EL-LS-N-LGDAG---LC-GL--P-C

[0245] Similarly, for a threshold of 30% or more identical amino acidsat a given position the RI repeat motif pair was defined by thefollowing amino acid sequence A-type LRR consensus 1 3 5 7 9 11 13 15 1719 21 23 25 27 +LE-LWL-DCGLTAAGCKDLCSVLRAN- where + is R or K B-type LRRconsensus 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29SLREL-LS*N-LGDAGV-LLCEGLL-P-C where * is N or S

[0246] Finally, for a threshold of 25% or more identical amino acids ata given position the RI repeat motif pair was almost completely definedby only one amino acid per position. A-type LRR consensus 1 3 5 7 9 1113 15 17 19 21 23 25 27 +LEKLWLEDCGLTAAGCKDLCSVLRANP where + is R or KB-type LRR consensus 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29SLRELDLS*NELGDAGVRLLCEGLL#PGC where * is N or S and # is D or Q

[0247] This is to illustrate how a sequence motif can be derived onlyfrom sequence information and alignment. However, preferably structuralinformation should be taken into account

[0248] 2) Defining Framework and Target Interaction Residue Positions

[0249] The analysis of both A- and B-type LRR units revealed that theside chains of the amino acids at positions 2, 5, 7, 10, 12, 17, 20, and24 are always oriented towards the hydrophobic core (Kobe andDeisenhofer, 1994 and FIG. 4b and 4 d) and these amino acids constitutea subset of the framework residues. Other framework residues are theglycine at position 16 and the prolines at position 28 in the A-type LRRunit (abbreviated A28) and position 27 in the B-type LRR unit(abbreviated B27), since they initiate and terminate the α-helix of eachLRR unit. Furthermore, positions A1, A3, A13, A18, A19, A22, A25, A27and B1, B3, B11, B14, B18, B22, and B26 most often harbour hydrophilicamino acid residues oriented towards the surrounding solvent and weretreated as framework positions. Similarly, positions A14, A15, A21, A23,A26, and B15, B19, B21, B25, and B29 are usually occupied by hydrophobicamino acid residues stabilising the interface of the repeat modules andare thus also treated as framework positions. Further, positions A11,B13, B23 and B28 feature glycine with allow more flexibility than otheramino acids and are therefore also important for the framework. Incontrast, the positions 4, 6, 8, and 9 were defined to be the targetinteraction positions in the RI repeat motif pair.

[0250] 3) Replacing Unfavorable Amino Acids

[0251] The RI consensus is also characterised by extremely wellconserved cysteines at positions A10 and A17 and positions B21 and B29.However, as free cysteines may be oxidised and cause complications, itis desirable to design cysteine free modules. Therefore, appropriatereplacements were sought. Inspection of the three-dimensional structure(MTS#1) revealed that the cysteine at position A10 made a H-bond.Further, alignments to more distant LRR molecules revealed the presenceof either asparagine, serine, or threonine in most cases. Thus, theposition A10 in the LRR module was designed to be occupied by thesethree amino acids. Similarly, position A17 was found to be part of thehydrophobic core, which is why in the LRR module methionine, leucine, orvaline were used. At the same time, these two positions A10 and A17constitute cases where framework positions are randomised. At positionB21, the cysteine of the first and last repeats in all analysed RIsequences was constantly occupied by leucine (with one exception ofvaline) and thus defined to be leucine in the final LRR module. In caseof position B29, the choice was accordingly between serine andthreonine, where the threonine was chosen to allow an assembly with therestriction endonuclease sites of BssHII and MluI (for a detaileddescription see FIG. 6).

[0252] The last remaining cysteine, which occurred in 36% of theanalysed position A21 (FIG. 5c), was set to be alanine because this wasthe second most frequent amino acid at the given position and alsoseemed to match the hydrophobic environment. The decision wasfacilitated since it was noted that in most cases where leucine wasfound at position B21, position A21 was occupied by alanine. In otherwords, the leucine at position B21 seems to prefer alanine at positionA21. Thus, stacking was believed to be supported best with this choicein the LRR module. Another decision was required for position A1. Fromthe two possible positively charge amino acids lysine and arginine, thelatter was chosen to match the above mentioned restriction endonucleasesites.

[0253] The refined repeat sequence motif can thus be described by thefollowing sequence A-type LRR consensus 1 3 5 7 9 11 13 15 17 19 21 2325 27 RLEKLWLED2GLTAAG4KDLASVLRANP where 2 is N or S or T and 4 is L orM or V B-type LRR consensus 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29SLRELDLS*NELGDAGVRLLLEGLL#PGT where * is N or S and # is D or Q

[0254] 4) Defining the Target Interaction Residues

[0255] For the definition of the target interaction positions, both thehuman RI-angiogenin (Papageorgiou et al., 1997) and the pig RI-RNase A(Kobe and Deisenhofer, 1995) complexes were analysed. Apart fromextensive interactions at both the N- and the C-terminal capping units,the interactions of repeat units involved most frequently positions 6,8, and 9 of the A-type LRR unit, whereas in the B-type LRR unit,positions 4, 6 and 9 were used most often. All these positions arecharacterised by side chains emanating from the β-strand of the LRR unit(FIG. 4e) and are therefore suited for target interactions. As however,the glutamate at position 4 of the B-type LRR unit was present withoutexception and an additional structural importance could not bedismissed, we refrained from randomising this position. Thus, thisposition constitutes a case where a target interaction position is notrandomised. In contrast, the position A4 was also defined to berandomised since it showed less than 30% conservation. Thereforepositions A4, A6, A8, and A9 and positions B6 and B9 were randomised inthe LRR module. The chosen subset of the amino acids at the randomisedpositions largely reflected the physicochemical properties of naturallyoccurring ones and therefore all charged and some H-bond forming andaromatic amino acids known to support binding in many instances werechosen. At the same time, the decision was taken to allow larger aminoacids only in the A-type LRR unit positions and only smaller ones at theB-type LRR unit positions minimising steric hindrance in an alternatingcontext. Thus, the obtained repeat sequence motif at this stage can bedescribed as follows A-type LRR consensus 1 3 5 7 9 11 13 15 17 19 21 2325 27 RLE1L1L112GLTAAC4KDLASVLRANP where 1 is D, E, N, Q, S, R, K, W, Yand 2 is N or S or T and 4 is L or M or V B-type LRR consensus 1..3 5 79 11 13 15 17 19 21 23 25 27 29 SLREL3LS3NELGDAGVRLLLEGLL#PGT where 3 isG, S, D, N, H or T and # is D or Q

[0256] Thus, randomisation at eight positions resulted in 2.8×10⁵independent RI repeat module pairs. In other words, the synthesis ofmolecules satisfying the above described repeat sequence motif willcreate about 300000 independent but highly homologous members.

[0257] Another position analysed in detail is in the loop region on topof both LRR repeat units, namely position 11. Since the consensus inboth A-type and B-type LRR unit was 36% and 25% respectively, theoccurrence of pairs of amino acids was checked. In the B-type LRR unit,charged amino acids were slightly preferred at position 11 and a lysineoften occurred with an aspartate in the A-type LRR unit. This putativesalt bridge was believed to increase stability and solubility of thedesigned LRR module and was therefore chosen. Another possibility(glycine at A11 and glutamate at B11) was dismissed for fear of too highflexibility.

[0258] The choice at position A14 was between alanine and glutamate,where the latter was again chosen to enhance the solubility and thecorrect orientation the hydrophilic outer shell. Similarly, the positionB22 suggested either glutamate or glutamine, where the latter was chosensince it seemed to better match the serine at A22 defined previously.

[0259] Finally, position 26 was subject to scrutiny, where the choicewas between alanine at A26 together with glutamine at B26 on the onehand, and serine at A26 together with aspartate at B26. The lattervariant was adopted to again enhance the solubility of the LRR module.

[0260] Thus, the RI repeat motif pair looks as follows (alterations areprinted bold) A-type LRR consensus 1 3 5 7 9 11 13 15 17 19 21 23 25 27RLE1L1L112DLTEAG4KDLASVLRSNP where 1 is D, E, N, Q, S, R, K, W, Y and 2is N or S or T and 4 is L or M or V B-type LRR consensus 1 3 5 7 9 11 1315 17 19 21 23 25 27 29 SLREL3LS3NKLGDAGVRLLLQGLLDPGT where 3 is G, S,D, N, H or T

[0261] 5) Designing a Repeat Domain Derived from the LRR of Mammalian RI

[0262] Assembling multiple repeat modules into a domain isstraightforward. Here, we undertook an approach involving two differentrestriction enzymes creating compatible overhangs (cf. FIG. 6). Thus,the direction of the ligation can simply be controlled by redigestingthe ligation products, where only correctly ligated molecules are notcut.

[0263] Additionally, we chose to complement the assembled LRR modules byN- and C-terminal capping modules designed to shield the putative jointhydrophobic core of the repeat domain from the surrounding solvent. Theanalysis of the mammalian RI proteins revealed that the first and thelast LRR units differed significantly from the consensus described above(FIG. 5c and 5 d). For simplicity, the corresponding capping units ofthe human RI were cloned with slight modifications and are henceforthreferred to as capping modules. Thus, amino acids 1 to 28 for theN-terminal capping module and amino acids 427 to 460 of human RI for theC-terminal module were used, and a short linker encoding the amino acidresidues PYAR was introduced between the N-terminal capping module andthe RI repeat module pairs to match the length requirements.

[0264] When the devised amino acid consensus was reverse translated intoa DNA sequence the following parameters were taken into account: Noundesired restriction enzyme recognition sequences were allowed withinthe repeat module pair and the codon usage was optimised for expressionin E. coli.

[0265] 6) Preparation of the Expression Plasmids

[0266] To obtain the N-terminal module flanked by appropriaterestriction digestion sites, the DNA of pTRP-PRI (Lee and Vallee, 1989)was amplified with oligonucleotides MTS2 and MTS4 (Table 1) givingPCR-fragment N. Thus, at the 5′-end, an NcoI and a BamHI wereintroduced, whereas the 3′-end featured a BssHI and a HindIII site. Theresulting DNA fragment is shown with translated amino acids in thecorrect frame (above the boxed part in FIG. 8).

[0267] The PCR-fragment N was ligated into the NcoI and HindIII sites ofpTFT74 (Ge et al., 1995) yielding plasmid pTFT_N (FIG. 9a). At the sametime, an N-terminal Flag-tag and a 6×His-tag were introduced. Severalvectors were derived from pTFT_N for the insertion of the abovedescribed repeat modules. The NcoI-HindIII insert of pTFT_N was clonedinto pQE60 (QIAgen, Hilden, Germany) prepared with the same restrictiondigestion enzymes (giving plasmid pQE_N). The BamHI-HindIII insert ofpTFT_N (that is without N-terminal Flag-tag and the 6×His-tag) wascloned into a pQE60 derivative downstream of the lambda phage protein Dgene insert in frame to yield a C-terminally fused repeat domain (givingplasmid pQE-pD_N). The pTFT derivatives feature a T7 polymerase promotorunder a lac operator, whereas the pQE derivatives offer a T5 polymerasepromotor under the same control system. Lambda phage protein D asN-terminal fusion partner was chosen to increase the solubility andexpression (Forrer and Jaussi, 1998).

[0268] 7) Synthesis of the Repeat Module Libraries

[0269] Oligonucleotides MTS7 and MTS9 were partly assembled fromtrinucleotides (Virnekäs et al., 1994) all other oligonucleotides weresynthesised with standard techniques.

[0270] The strategy presented below describes a way to obtain polymersof DNA fragments in a defined direction using palindromic restrictionenzymes and ligation. One such possibility is to use the restrictionenzymes BssHII and MluI (FIG. 6) which create compatible overhangs. IfDNA fragments with the same overhang but different original recognitionsites are religated a new combined site (named * in FIG. 6 and FIG. 7ato 7 c) will be formed which cannot be digested by either of theoriginal enzymes (FIG. 6). However, the ligation of identical ends leadsto the original recognition site and these molecules can therefore bedistinguished by restriction digestion. Other pairs of restrictionenzymes with compatible overhangs are well known to those skilled in theart.

[0271] The following step numbering refers to the one used in FIG. 7a toc.

[0272] (Step I) To obtain the first library of repeat modules the partlyrandomised oligonucleotides MTS7, MTS8, MTS9, and MTS10 were assembledby PCR and were amplified with a 10-fold molar excess of MTS11b andMTS14b in one step (95 degrees for 2 min; then 20 cycles of 95 degreesfor 15 sec; 55 degrees for 15 sec, and 72 degrees for 20 sec followed by72 degrees, 1 min). In case of the LRR library described here, theinitial PCR assembles the above described A/B pair into one module. Theresulting DNA fragment is shown with translated amino acids in thecorrect frame (boxed part in FIG. 8, the oligonucleotides are shown asarrows).

[0273] (Step II) Separate extensive restriction digestion with eitherBamHI and MluI or BssHII was followed by ligation with T4 ligase (1 hourat room temperature and heat inactivation of the enzyme). The resultingligation product was purified by low melting point agarose gelelectrophoresis. The band corresponding to the dimer repeat module wasisolated and the DNA was recovered after β-agarase digestion by ethanolprecipitation.

[0274] (Step II) To amplify the dimer of repeat modules a second PCRreaction with primers T7pro and srpTFT1 (95 degrees for 2 min; then 15cycles of 95 degrees for 15 sec; 50 degrees for 15 sec, 72 degrees for40 sec followed by 72 degrees for 1 min) was performed. In case of theLRR library this step yielded two A/B pairs, that is four leucine-richrepeats. As 1 microgram template corresponding to about 10¹² moleculeswas used for the LRR library the total theoretical diversity was stillcovered at this stage.

[0275] (Step IV) For the tetramer, the obtained DNA was again digestedwith either BamHI and MluI or BssHII. For longer polymers mixtures ofsingle and doubly digested DNA fragments were prepared.

[0276] (Step V) The ligation, restriction digestion, and purificationcan be repeated until the desired number of repeat modules is obtained.

[0277] (Step VI) To obtain a DNA fragment with two differentnon-compatible restriction digestion sites at both ends for the directedand efficient cloning into a plasmid, the following “capping” strategywas devised. The C-terminal repeat unit of human RI was also amplifiedfrom plasmid pTRP-PRI by PCR thereby introducing a BssHII restrictionsite on the 5′-end and a HindIII restriction site at the 3′-end. Theresulting DNA fragment is shown with translated amino acids in thecorrect frame (below the boxed part in FIG. 8).

[0278] The primers MTS5a and MTS3 were used in this PCR reaction (95degrees for 2 min, then 20 cycles of 95 degrees for 15 sec, 45 degreesfor 15 sec, and 72 degrees for 10 sec followed by 72 degrees for 1 min)and the product was OIAquick purified and restriction digested withBssHII.

[0279] (Step VII) The BssHII digested C-terminal repeat module wasligated to a MluI digested polymers by T4 ligase (1 hour at roomtemperature and heat inactivation of the enzyme). The subsequentextensive restriction digestion with BssHII, MluI, and HindIIIascertained the correct orientation of the modules. The mixture wasseparated by low melting point agarose gel electrophoresis and thedesired bands were recovered as above. Finally, the recovered fragmentswere ligated into any of the BssHII-HindIII digested plasmids pTFT_N,pTFT-pD_N, pQE_N or pQE-pD_N. The resulting ligation mix was QIAquickpurified and used for electroporation of XL10Gold cells preparedaccording to Sidhu et al. (2000).

[0280] The above described protocol results in different libraries ofplasmids and two representative diagrams of such plasmids are shown(FIG. 9b and 9 c).

[0281] 8) Characterization of the Repeat Module Protein Libraries

[0282] Standard DNA sequencing techniques were used to determine the DNAsequence of the expression plasmids. As an example the DNA sequence ofclone D17 (compare expression in FIG. 11c, 11 d, and 12 d) is given(FIG. 10a and 10 b). The N-terminal module and the four repeat modulesas well as the C-terminal module are indicated. Expression wasessentially performed as described (QIAgen “QIAexpressionist”) and thesoluble and insoluble proteins of single clones and/or pools of clonesafter sonification were separated by SDS-PAGE analysis and Coomassiestained (FIG. 11a-d). Western blot analysis was performed according tothe protocol supplied by the manufacturer. Antibody anti-Flag M2 (Sigma)was used for the constructs without N-terminal protein D, whereasanti-RGS-His (Qiagen) was used for constructs with N-terminal protein D(FIG. 12a-d).

[0283] Purifications (FIG. 13 and 14), CD spectrometry (FIG. 15) andsize exclusion chromatography (FIG. 16) were carried out as described inexample 2.

[0284] 9) Selection of (Poly)Peptide/Proteins which Inhibit BacterialToxins

[0285] Various bacterial toxins are known to occur together with thecorresponding antitoxin because even a moderate level of toxin alonecannot be tolerated in bacteria. Therefore, the gene of CcdB (Jensen etal., 1995) was cloned into a low copy plasmid of the pZ series with atightly repressed tetracyclin promotor in a tetracyclin repressor strainlike DH5□Z1 (Lutz and Bujard, 1997), XL10Gold or XL1Blue. In parallel,wild-type barnase (Hartley, 1988) and the barnaseH102K mutant with 0.1%activity (Jucovic and Hartley, 1996) were cloned. Chemically competentcells with one of these toxin plasmids were prepared as described (Inoueet al., 1990) and electroporation competent cells harbouring one ofthese toxin plasmids were prepared as described (Sidhu et al., 2000).For the selection of plasmids encoding a toxin inhibitor cells weretransformed with the LRR-based library, plated on selective plates (LBmedium supplied with 50 mg/L ampicilin, 20 mg/L kanamycin, 40 micromolarIPTG, and 30 microgram/L anhydrotetracyclin), and grown at either 25 or37° C. To confirm that inhibitory properties are plasmid-linked, the pQEderivatives were reisolated and retransformed.

[0286] Screening for Efficiently Folding Constructs

[0287] GFP has been successfully used as a folding reporter when fusedto the C-terminus of the target protein (Waldo et al., 1999). Rapidlyaggregating targets do not allow folding of C-terminally fused GFP andcolonies can be screened in UV light. The fluorescence of GFP correlatedwith the amount of correctly folded protein. In our strategy, GFP wascloned into the NheI and EcoRI sites designed at the C-terminus obtainedby PCR amplification using MTS5a and MTS6 and again pTRP-PRI astemplate. Hereby, a 12 amino acid linker GSAGSAAGSGEF was introduced.The resulting DNA fragment is shown with translated amino acids in thecorrect frame (at the bottom in FIG. 8).

[0288] Selection for Constructs Without Stop-codons

[0289] To reduce the number of frameshifts and stop-codons after theconstruction of the library, the constructs were cloned upstream of alinker connecting to the chloramphenicol resistance gene and viableclones were selected on plates.

[0290] Selection for Binding Targets Using Display Techniques

[0291] To identify binding partners in vitro, both ribosome display(Hanes et al., 1998) and phage display (Dunn, 1996) was used. Bindingpartners were RNase A and Onconase (Wu et al., 1993b) from the RNasesuperfamily and protein D (Forrer and Jaussi, 1998), an unrelated smallpolypeptide.

[0292] Selection for Binding Targets Using the Protein ComplementationAssay

[0293] To identify binding partners, an E. coli genomic library wasfused to the DHFR1 fragment (Pelletier et al., 1998), whereas theLRR-based library was fused next to DHFR2. Selection on M9 platescontaining trimethoprim lead to interacting molecules.

[0294] DNA Module Shuffling for the Improvement of the ObtainedConstructs

[0295] For further evolutionary improvements, the obtained constructswere subjected to DNA shuffling (Stemmer, 1994) and back-crossing. Thus,improvements could be enriched and mutations without effect were lost.

Example 2 Collection of (Poly)Peptide/Protiens Comprising Repeat ModulesDerived from Ankyrin Repeat Units

[0296] A method for the generation of designed ankyrin repeat proteinsaccording to the present invention is described. The method allows theconstruction of ankyrin repeat proteins of various length by using anN-terminal ankyrin capping module, two or several ankyrin repeat modulesand a C-terminal ankyrin capping module.

[0297] The definition of the ankyrin repeat motif which was the basisfor the generation of a collection of ankyrin repeat modules in EXAMPLE1 is described below. The analysis leading to the ankyrin repeat motifincluded search of public databases for naturally existing ankyrinrepeat proteins as well as structural analysis of ankyrin repeatproteins with known three-dimensional structure. By way of thisanalysis, a sequence , motif for the ankyrin repeat modules was derivedand ankyrin capping modules were derived. Furthermore, the positions offramework and target interaction residues were determined for theankyrin repeat motif. To generate a library of ankyrin repeat modules,17 out of 20 natural amino acids were allowed at the positions of targetinteraction residues in the ankyrin repeat motif. The positions of theframework residues were specified to certain amino acids each. Theresulting peptide sequences were reverse translated such that the codonusage was optimal in Escherichia coli but did not create unwantedrestriction sites. Oligonucleotides were designed to allow assembly PCRof the ankyrin repeat modules. Trinucleotide oligonucleotides (Vimekäset al., 1994) as well as conventional oligonucleotides were used (Tables2 and 3). Similarly, the N- and C-terminal ankyrin capping modules weregenerated by assembly PCR using conventional oligonucleotides (Table 2).The resulting PCR products all contained type IIs restriction enzymerecognition sites (FIG. 17) at those ends that subsequently would beconnected to the DNA of the next/previous repeat- (or capping) module(FIG. 18). When cut by the respective restriction enzymes, the generatedcompatible ends of the modules could be ligated in frame in aunidirectional way. Hence, the N-terminal ankyrin capping module couldbe ligated to one or several ankyrin repeat modules and the ligationproducts could be ligated to the C-terminal ankyrin capping module. Asthe DNA differed in defined positions, the method allowed thesimultaneous assembly of a diverse set of DNA molecules encoding acollection of ankyrin repeat proteins. Members of the resultingcollections of ankyrin repeat proteins were characterised by expression,purification, circular dichroism spectroscopy, denaturation experiments,size exclusion chromatography as well as crystallisation. Theexperiments demonstrated that unselected members of this ankyrin repeatprotein library can be expressed in the reductive environment of thecytoplasm at high levels in a soluble and folded conformation.

[0298] Definition of the Ankyrin Repeat Motif Sequence

[0299] PROCEDURE and RESULT: The ankyrin repeat motif used as an examplefor the present invention was derived from ankyrin repeat proteinsequence analysis as well as from structural analysis of ankyrin repeatproteins with known three-dimensional structure (date: August 2000).

[0300] The SMART database (Schultz et al., 2000) was first searched foramino acid sequences of ankyrin repeat units. A Clustal-W (Thompson etal., 1994) alignment of 229 ankyrin repeat units served as template forthe determination of an ankyrin repeat unit consensus “A” (FIG. 19).Consensus “A” was determined by calculation of the residue-frequencyoccurrence for each position of the alignment of ankyrin repeat units.The 229 ankyrin repeat units considered did not contain inserts ordeletions compared to a previously stated general ankyrin repeat unitconsensus sequence (Sedgwick and Smerdon, 1999). Consensus “A”, however,included only residues 3 to 32 (FIG. 19) of the 33 amino acids longconsensus sequence of Sedgwick and Smerdon (1999). To further refine theconsensus and define the lacking positions, a BLAST (Altschul et al.,1990) search against GenBank (Benson et al., 2000) was performed usingdefault parameters. For this search, consensus “A” was submitted in acircularly permutated form with position 20 as first amino acid (FIG.19). The missing or ambiguous positions were filled with residues thathad highest frequency in a consensus of ankyrin repeat units of ankyrinrepeat proteins with known three-dimensional structure (manuallyaligned, statistics as described above). The first 200 of the resultingBLAST hits were manually aligned and the ankyrin repeat unit consensus“A” was refined by residue frequency analysis as stated above yieldingconsensus “B” (FIG. 19). Consensus “B” was confirmed by an identicalanalysis of the pfam database (Bateman et al., 1999; data not shown).

[0301] The final ankyrin repeat unit consensus “C” (FIG. 19) wasobtained by integration of the methods mentioned in this paragraph.Published three-dimensional structures of ankyrin repeat proteins werevisually inspected to further decide which amino acids were optimal at acertain position. The-three-dimensional structure showing highesthomology to ankyrin repeat unit consensus “B”, the mouse GA-bindingprotein beta 1 subunit (AC: 2981726, pdb: 1AWC; Batchelor et al., 1998),was the guideline in most instances, but other structures such as humanp18 (AC: 4139830, pdb: 1IHB; Venkatamarani et al., 1998) were alsoconsidered. The mutual dependence of pairs, triplets and quatruplets ofamino acids in naturally occurring ankyrin repeat unit sequences wasalso used to further develop or assure consensus “B”. Furthermore,modeling approaches (insightII package; Informax Inc., USA) includinghomology modeling and energy minimisations have been performed and theconsensus sequence was developed towards optimal cavity avoidance andpacking optimisation. It was further ensured that the secondarystructure propensity (O'Neil and DeGrado, 1990; Chou and Fasman, 1978)of each residue of the consensus matched the secondary structure at thecorresponding position in natural ankyrin repeat units. In addition, thesecondary structure was analysed and verified using PhD-prediction(Rost, B., 1996). Protein stability and protease resistance of theconsensus was then analysed using PEST (Rogers et al., 1986; SwissInstitute of Bioinformatics, Switzerland) and peptidesort of GCG(Accelrys, USA; Womble, D. D., 2000) and the consensus was predicted tobe sufficiently stable.

[0302] Critical residues during the definition of ankyrin repeat unitconsensus “B” (FIG. 19) to consensus “C” (FIG. 19) were positions 16,17, 18,19, 21, 22, 25 and 26. Position 16 was finally determined to be ahistidine, since it makes buried H-bonds from its position to theprevious repeat. The leucine at position 17 was finally preferred toother amino acids since it stabilises the interface of two repeatmodules. The glutamate of position 18 was chosen as repeated glutamatesand aspartates occur in human p18 at this position. Similarly, theglutamate at position 21 occurs in multiple successive copies in mouseGA-binding protein. Lysine 25 was preferred to other amino acids as thebasic residues arginine and lysine occur repeatedly in mouse GA-bindingprotein as well. For position 26, the compromise of taking any of thethree amino acids histidine, tyrosine or asparagine was chosen, as theseamino acids all fulfil the requirements for this position. Accordingly,the positions 19 and 22 were occupied by isoleucine or valine and valineor leucine, respectively, as these residues fitted equally well.

[0303] The finally determined ankyrin repeat unit consensus “C” (FIG.19) served as basis for the ankyrin repeat modules. The sequence of theankyrin repeat motif is shown in FIG. 20. For cloning reasons the motifis based on a circularly permutated consensus “C”. In order to match theconsensus numbering scheme used in FIG. 19 and used by Sedgwick andSmerdon (1999), the numbers of the positions in the ankyrin repeat motifwere circularly permutated in parallel to the amino acid sequence. Theankyrin repeat motif has a length of 33 amino acids, whereof 27positions were defined to be framework residues and 6 positions weredefined as target interaction residues. The positions of frameworkresidues were defined using ankyrin repeat unit consensus “C”. Analysesof three-dimensional structures showed that positions 2, 3, 5, 13, 14and 33 of the ankyrin repeat units are often involved in protein-proteininteractions and hence constitute the target interaction residues. Thiswas also suggested by the high variability these positions showed duringankyrin repeat unit consensus definition. For the ankyrin repeatmodules, these residues were defined to be any of the 17 amino acids A,D, E, F, H, I, K, L, M, N, Q, R, S, T, V, Wand Y.

[0304] Thus, the number of independent members of the collection ofankyrin repeat modules can be calculated to be 3·17⁶=72′412′707.

[0305] Definition of the Ankyrin Capping Modules

[0306] PROCEDURE and RESULT: As the derived ankyrin repeat motif showedhigh homology to the beta 1 subunit of the mouse GA-binding protein(GABP beta 1; AC: 2981726; Batchelor et al., 1998), the N- andC-terminal ankyrin repeat capping units (repeats 1 and 5 according toBatchelor et al., 1998) of the latter protein were chosen as a basis forthe N- and C-terminal capping modules. Both the N- and C-terminalankyrin capping module had to be changed compared to the mouseGA-binding protein beta 1 capping units. The N-terminal GA-bindingprotein beta 1 capping unit was modified in its loop to sterically fitthe design of the ankyrin repeat motif. The C-terminal GA-bindingprotein beta 1 capping unit was modified at several positions. Parts ofthe loop of repeat 4 and the beta hairpin connecting repeat 4 and 5 ofGA-binding protein beta 1 (Batchelor et al., 1998) had to be includedinto the C-terminal capping module for cloning reasons. Thereby, theloop and the beta hairpin were modified to sterically fit the design ofthe ankyrin repeat motif. The modifications can be seen in FIG. 21,where GABP beta 1 is aligned to E3-5, a member of a protein libraryaccording to the present invention (see below).

[0307] Experimental Procedures

[0308] For all following sections of EXAMPLE A, techniques wereperformed according to protocols as described in Sambrook, J., Fritsch,E. F. and Maniatis, T. (1989; Molecular cloning: a laboratory manual.Cold spring laboratory press, New York) or in volumes 1 to 4 of Ausubel,F. M., Brent, R., Kingston, R. E., Moore, D. D., Seidman, J. G., Smith,J. A. and Struhl, K. (1994; Current protocols in molecular biology. JohnWiley and Sons, Inc., New York) or in volumes 1 and 2 of Coligan, J. E.,Dunn, B. M., Ploegh, H. L, Speicher, D. W. and Wingfield, P. T. (1995;Current protocols in protein science. John Wiley and Sons, Inc., NewYork).

[0309] Synthesis of DNA Encoding Ankyrin Repeat Proteins According tothe Present Invention

[0310] PROCEDURE and RESULT: Oligonucleotides INT1 and INT2 were partlyassembled from trinucleotides (Virnekäs et al., 1994) and were obtainedfrom MorphoSys (Germany). All other oligonucleotides were synthesisedwith standard techniques and were from Microsynth (Switzerland, cf.Tables 2 and 3). Oligonucleotides for amplification of DNA were used at100 μM stock concentration, while the ones used as templates were usedas 10 μM stock. Enzymes and buffers were from New England Biolabs (USA)or Fermentas (Lithuania). Cloning strain was E. Coli XL1-Blue(Stratagene).

[0311] The ankyrin repeat modules were generated by assembly PCR usingoligonucleotides (1 μl each) INT1, INT2, INT3, INT4, INT5 and INT6a [5min 95° C., 20·(30 sec 95° C., 1 min 50° C., 30 sec 72° C.), 5 min 72°C.] and Vent DNA polymerase in its standard buffer supplemented withadditional 3.5 mM MgSO₄ in a final volume of 50 μl.

[0312] The N-terminal ankyrin capping module was prepared by assemblyPCR using oligonucleotides (1 μl each) EWT1, EWT2, TEN3 and INT6 [5 min95° C., 30·(30 sec 95° C., 1 min 40° C., 30 sec 72° C.), 5 min 72° C.]and Vent DNA polymerase in its standard buffer in 50 μI reaction volume.The resulting DNA was cloned via BamHI/HindIII into pQE30 (QIAgen,Germany). The DNA sequence was verified using standard techniques. TheC-terminal ankyrin capping module was prepared accordingly, but by usingoligonucleotides WTC1, WTC2, WTC3 and INT5.

[0313] The ligation of the DNA encoding an ankyrin repeat protein fromsingle ankyrin repeat modules and ankyrin repeat capping modules isrepresented schematically in FIG. 18. To assemble ankyrin repeatproteins, the cloned N-terminal ankyrin capping module was PCR-amplifiedusing oligonucleotides TEN3 and INT6a (conditions as above for theN-terminal ankyrin capping module). The DNA was purified using theOIAquick DNA purification kit (QIAgen, Germany), cut with BsaI andrepurified using the same kit. The N-terminal ankyrin capping module wasthen ligated onto BpiI cut and purified ankyrin repeat module. Thisdirectional cloning was possible since the cutting sequences of BpiI andBsaI, two type IIs restriction enzymes which recognise a DNA sequencedifferent from the cutting sequence (FIG. 17), was chosen to beasymmetric but compatible with each other. The ligation product, termedN1, was gel-purified (LMP-agarose, β-agarase, sodiumacetate/ethanolprecipitation) and PCR-amplified using oligonucleotides (1 μl each) EWT3and INT6b [5 min 95° C., 20·(30 sec 95° C., 30 sec 50° C., 30 sec 72°C.), 5 min 72° C.] and Vent DNA polmerase in its standard buffer in 50μl reaction volume. The amplified product was purified using QIAquick,cleaved with BsaI and purified again. The subsequent ligation to BpiIcut ankyrin repeat modules started a new cycle of elongation which wasrepeated until the desired number of ankyrin repeat modules was added tothe N-terminal ankyrin capping module (termed N2, N3, N4 etc.). DNApieces corresponding to PCR-amplified N2, N3 and N4 were then cut withBsaI and ligated to a previously BpiI-cut PCR product of the clonedC-terminal ankyrin capping module. This yielded DNA molecules encodingN2C, N3C and N4C ankyrin repeat protein libraries. The final productswere PCR amplified using 1 μl of each EWT3 and WTC3 each [5 min 95° C.,25·(30 sec 95° C., 30 sec 50° C., 1 min 72° C.), 5 min 72° C.] andcloned via BamHI/HindIII into pQE30 (QIAgen).

[0314] Protein Expression and Purification

[0315] PROCEDURE: E. coli XL1-Blue (Stratagene) was used as strain forthe expression of ankyrin repeat proteins of different lengths. Twoclones corresponding to N2C (named E2-5 and E2-17), two clonescorresponding to N3C (E3-5 and E3-19) and two clones corresponding toN4C (E4-2 and E4-8) were randomly chosen and analysed further. 25 ml ofstationary overnight cultures (LB, 1% glucose, 100 mg/l ampicillin; 37°C.) of these clones were used to inoculate 1 I cultures (same media aspreculture). At OD₆₀₀=0.7, the cultures were induced with 300 μM IPTGand incubated for four hours. Samples were taken at various timepointsand analysed via SDS-PAGE (see FIG. 22). The cultures were centrifugedand the resulting pellets were taken up in 40 ml TBS₅₀₀ (50 mM TrisHCl,pH 8.0, 500 mM NaCl) and sonified. Then the lysates were supplementedwith 10% glycerole and 20 mM imidazole and recentrifuged. The resultingsupernatant was used for purification over a His-tag column (2.5 clcolumn volume) according to the manufacturer (QIAgen, Germany).

[0316] RESULTS: Cell fractionation experiments showed that all ankyrinrepeat proteins were soluble expressed with yields of 200 mg/l culture(FIG. 22). His-tag purification led to pure protein in a singlepurification step (FIG. 23). The proteins integrity was furtherconfirmed by mass spectroscopy (not shown). The soluble expressionindicates proper folding of the designed repeat proteins.

[0317] Size Exclusion Chromatography

[0318] PROCEDURE: The six purified samples described above were analysedon a Superdex 75 column (Amersham Pharmacia Biotech, USA) using aPharmacia SMART system at a flow rate of 60 μl/min and TBS 150 (50 mMTrisHCl, pH 7.5; 150 mM NaCl) as running buffer. Standards were□-amylase (Sigma) and the phage proteins pD and SHP (Yang et al., 2000).As an example the elution profile of a N3C-library member, E3-5, isshown in FIG. 24.

[0319] RESULTS: The elution profile showed that the proteinsinvestigated were in most cases exclusively monomeric, while a minornumber of protein samples (E2-17 and E4-8) showed multimerised, butsoluble species in addition to the monomers. The retention measured bygel filtration indicated that the investigated proteins are folded andnot random coils.

[0320] CD Spectroscopy

[0321] PROCEDURE: The circular dichroism spectra of a randomly chosenankyrin repeat protein generated according to the present invention(E3-5, a N3C molecule) were recorded either in 10 mM sodium phosphatebuffer pH 6.5 (native) or 20 mM sodium phosphate buffer pH 6.5 and 6 MGuanidinium hydrochloride (denatured) using a Jasco J-715 instrument[Jasco, Japan; 10 nm/s, 8 sec response, 0.2 nm data pitch, 2 nm bandwidth, 195-250 nm (native) or 212-250 nm (denatured), threeaccumulations, measurements in triplicates, 1 mm cuvette]. The CD signalwas converted to mean residue elipticity using the concentration of thesample determined spectrophotometrically at 280 nm under denaturingconditions.

[0322] RESULTS: E3-5 shows an alpha-helical spectrum under nativeconditions with minima at 208 nm and 222 nm. The secondary structure islost in 6 M Guanidinium hydrochloride (FIG. 25). This indicates theproper formation of secondary structure elements in E3-5.

[0323] Denaturation Behaviour

[0324] PROCEDURE: The denaturation behaviour of randomly chosen ankyrinrepeat proteins generated according to the present invention (E2-5, E3-5and E4-8, FIG. 22) was measured via circular dichroism spectroscopybasically as indicated in FIG. 25 but using different buffers.Guanidiniumhydrochloride denaturation curves were measured by CDspectroscopy at 220 nm using the different proteins incubated indifferent concentrations of guanidinium hydrochloride in 20 mM NaPO₄pH6.5, 100 mM NaCl, overnight at room temperature. The circulardichroism signal at 220 nm was measured for each sample in triplicates.

[0325] RESULTS: The denaturation curves of E2-5, E3-5 and E4-8 againstdifferent concentrations of guanidinium hydrochloride are shown in FIG.26. The midpoint of denaturation is in a range of 2.5 to 3.8 Mguanidiniumhydrochloride. Hence, the secondary structure is lost only athigh concentrations of denaturing agent indicating a relatively highstability of the investigated molecules.

[0326] Crystallisation

[0327] PROCEDURE and RESULT: The ankyrin repeat protein E3-5, a N3Clibrary member according to the present invention, was crystallised in20% PEG 6000, 100 mM MES/NaOH pH 6.0 in five days at 20° C., hangingdroplet (2 μl protein and 2 μl buffer mixed; 500 μl buffer reservoir)from a solution of 9 mg Protein per ml in TBS 50 (50 mM TrisHCl, pH 8.0,50 mM NaCl; cf. FIG. 27). The crystal refracted to 3 Å in preliminaryX-ray experiments (not shown).

[0328] Tables

[0329] Table 1: Oligonucleotides used for the cloning of the libraryderived from human RI;

[0330] Table 2: Oligonucleotides used for the generation of ankyrinrepeat modules according to example 2;

[0331] Table 3: Oligonucleotides used for the generation of the N- andC-terminal ankyrin capping modules as well as for the cloning of ankyrinrepeat proteins containing more than one ankyrin repeat module accordingto the present invention. TABLE 1 Oligonucleotides used for the cloningof the library derived from human RI name sequence in 5′-3′ direction(restriction sites)¹ description MTS2CATGCCATGGACTACAAGGATCATCACCATCACCATCACGGATCCctgga fwd² PCR primer toobtain human RI catccag (NcoI, BamHI) with initial Flag-tag MDYKD and6xHis- tag MTS4 GCATAAGCTTATCACTCGAGGCGCGCGTAGGGctgctggagcagagg rev² PCRprimer to obtain N-term. RI (HindIII, XhoI, BssHII) unit MTS3GCATAAGCTTATCAggagatgaccc (HindIII) rev² PCR primer to obtain human RIMTS5a CATGCCATGGGcgcgCctcgagcagctggtcc fwd² PCR primer for new C-term.unit (NcoI, BssHII, XhoI) MTS7TTGGCGCGCCTGGAGNNNCTGNNNCTGNNNNNNNNNgacctcacc fwd² assembly left, 4library elements, gaggccggc (BssHII)³ 1 codon for S, N, T MTS8ccgcaggctcgggttggaGCGGAGCACGCTGGCCAGGTCCTTCANgccgg rev² assembly left, 1codon for L, M, V cctcggtgaggtc MTS9tccaacccgagcctgcggGAGCTGNNNCTGAGCNNNaacaagctcggcgatgca fwd² assemblyright, 2 library elements MTS10CCGCTCGAGACGCGTGCCGGGGTCCAGCAGCCCCTGCAAGAG rev² assembly rightCAGCCGCACGCCtgcatcgccgagcttgtt (XhoI) MTS11bTAATACGACTCACTATAGGGttggcgcgcctggag (BssHII) fwd² PCR primer to amplifythe assembly MTS14b GGCTTTGTTAGCAGCCGGATCctcgagacgcgtgccggggtc rev² PCRprimer to amplify the assembly (BamHI, XhoI, MluI) T7proAAATtaatacgactcactataggg fwd² PCR primer to amplify library dimersrpTFT1 CGggctttgttagcagccgg rev² PCR primer to amplify library dimer

[0332] TABLE 2 Oligonucleotides used for the generation of ankyrinrepeat modules according to example 2 Name Sequence in 5′-3′ direction(restriction sites) Description INT1CTGACGTTAACGCTNNNGACNNNNNNGGTNNNACTCCGCTGCACCTGGC¹ Forward primer (1)for the assembly of ankyrin repeat modules INT2ACTCCGCTGCACCTGGCTGCTNNNNNNGGTCACCTGGAAATCG¹ Forward primer (2) for theassembly of ankyrin repeat modules INT3AACGTCAGCACCGTDCTTCAGCAGAACTTCAACGATTTCCAGGTGACC² Reverse primer (1) forthe assembly of ankyrin repeat modules INT4 AGCAGCCAGGTGCAGCGGAGTReverse primer (2) for the assembly of ankyrin repeat modules INT5TTCCGCGGATCCTAGGAAGACCTGACGTTAACGCT (BamHI, BpiI) Forward primer forankyrin repeat module and C-terminal ankyrin capping moduleamplification (BpiI) INT6a TTTGGGAAGCTTCTAAGGTCTCACGTCAGCACCGT (HindIII,BsaI) Reverse primer for ankyrin repeat module and N-terminal ankyrincapping module amplification (BsaI)

[0333] TABLE 3 Oligonucleotides used for the generation of the N- andC-terminal ankyrin capping modules as well as for the cloning of ankyrinrepeat proteins containing more than one ankyrin repeat module NameSequence in 5′-3′ direction (restriction sites) Description INT6bTTTGGGAAGCTTCTAAGGTCTC (HindIII, BsaI) Reverse primer for theamplification of ankyrin repeat modules having a INT6a sequence at the3′ end INT6 TTTGGGAAGCTTCTAGAAGACAACGTCAGCACCGT (HindIII, BpiI) Reverseprimer for amplification of the N-terminal ankyrin capping module (BpiI)EWT1 TTCCGCGGATCCGACCTGGGTAAGAAACTGCTGGAAGCTGCTCGTGCTGGTCA Forwardprimer for the assembly of the GGACGACGAAG N-terminal ankyrin cappingmodule EWT2 AACGTCAGCACCGTTAGCCATCAGGATACGAACTTCGTCGTCCTGACC Reverseprimer for the assembly of the N-terminal ankyrin capping module EWT3TTCCGCGGATCCGACCTGGG (BamHI) Forward primer (1) for the amplification ofsequences containing the N-terminal ankyrin capping module TEN3TTCCGCGGATCCG (BamHI) Forward primer (2) for the amplification ofsequences containing the N-terminal ankyrin capping module WTC1CTGACGTTAACGCTCAGGACAAATTCGGTAAGACCGCTTTCGACATCTCCATC Forward primer forthe assembly of the GACAACGGTAACGAGG C-terminal ankyrin capping moduleWTC2 TTGCAGGATTTCAGCCAGGTCCTCGTTACCGTTGTC Reverse primer for theassembly of the C-terminal ankyrin capping module WTC3TTTGGGAAGCTTCTATTGCAGGATTTCAGC (HindIII) Reverse primer (1) for theamplification of sequences containing the C-terminal ankyrin cappingmodule

REFERENCES

[0334] Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman,D. J. (1990). Basic local alignment search tool. J Mol Biol 215,403-410.

[0335] Ausubel, F. M., Brent, R., Kingston, R. E., Moore, D. D., Sedman,J. G., Smith, J. A. and Struhl, K. eds. (1999). Current Protocols inMolecular Biology. New York: John Wiley and Sons.

[0336] Batchelor, A. H., Piper, D. E., de la Brousse, F. C., McKnight,S. L., and Wolberger, C. (1998). The structure of GABPalpha/beta: an ETSdomain-ankyrin repeat heterodimer bound to DNA. Science 279, 1037-1041.

[0337] Bateman, A., Birney, E., Durbin, R., Eddy, S. R., Finn, R. D.,and Sonnhammer, E. L. (1999). Pfam 3.1: 1313 multiple alignments andprofile HMMs match the majority of proteins. Nucleic Acids Res 27,260-262.

[0338] Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J.,Rapp, B. A., and Wheeler, D. L. (2000). GenBank. Nucleic Acids Res28,15-18.

[0339] Berks, A. H. (1994). Patent information in biotechnology. TrendsBiotechnol. 12, 352-64.

[0340] Blázquez, M., Fominaya, J. M., and Hofsteenge, J. (1996).Oxidation of sulfhydryl groups of ribonuclease inhibitor in epithelialcells is sufficient for its intracellular degradation. J Biol Chem 271,18638-18642.

[0341] Bork, P. (1993). Hundreds of ankyrin-like repeats in functionallydiverse proteins: mobile modules that cross phyla horizontally? Proteins17, 363-374.

[0342] Breeden, L., and Nasmyth, K. (1987). Similarity betweencell-cycle genes of budding yeast and fission yeast and the Notch geneof Drosophila. Nature 329, 651-654.

[0343] Chen, C. Z., and Shapiro, R. (1997). Site-specific mutagenesisreveals differences in the structural bases for tight binding of RNaseinhibitor to angiogenin and RNase A. Proc Natl Acad Sci USA94,1761-1766.

[0344] Chou, P. Y., and Fasman, G. D. (1978). Prediction of thesecondary structure of proteins from their amino acid sequence. AdvEnzymol Relat Areas Mol Biol 47,45-148.

[0345] Dunn, I. S. (1996). Phage display of proteins. Curr OpinBiotechnol 7, 547-553.

[0346] Forrer, P., and Jaussi, R. (1998). High-level expression ofsoluble heterologous proteins in the cytoplasm of Escherichia coli byfusion to the bacteriophage lambda head protein D. Gene 224, 45-52.

[0347] Ge, L., Knappik, A., Pack, P., Freund, C. and Plückthun, A.(1995). Expressing antibodies in Escherichia coli. Antibody Engineering.A Practical Approach (Ed. C. A. K. Borrebaeck). IRL Press, Oxford, pp.229-266.

[0348] Gorina, S., and Pavletich, N. P. (1996). Structure of the p53tumor suppressor bound to the ankyrin and SH3 domains of 53BP2. Science274, 1001-1005.

[0349] Groves, M. R. and Barford, D. (1999). Topological characteristicsof helical repeat proteins. Curr Opin Struct Biol 9, 383-389.

[0350] Hanes, J., Jermutus, L., Weber-Bornhauser, S., Bosshard, H. R.,and Plückthun, A. (1998). Ribosome display efficiently selects andevolves high-affinity antibodies in vitro from immune libraries. ProcNatl Acad Sci USA 95, 14130-14135.

[0351] Hartley, R. W. (1988). Barnase and Barstar. Expression of itscloned inhibitor permits expression of a cloned ribonuclease. J Mol Biol202, 913-915.

[0352] Hiatt, A. and Ma, J. K. (1993). Characterization and applicationsof antibodies produced in plants. Int Rev Immunol 10, 139-152.

[0353] Hiatt, A. (1990). Antibodies produced in plants. Nature 344,469-470.

[0354] Hillig, R. C., Renault, L., Vetter, I. R., Drell, T. t.,Wittinghofer, A., and Becker, J. (1999). The crystal structure of rna1p:a new fold for a GTPase-activating protein. Mol Cell 3, 781-791.

[0355] Hochuli, E., Bannwarth, W., Döbeli, H., Gentz, R. and Stüber, D.(1988). Genetic approach to facilitate purification of recombinantproteins with a novel metal chelate adsorbent. Bio/Technology 6,1321-1325.

[0356] Hofsteenge, J., Kieffer, B., Matthies, R., Hemmings, B. A., andStone, S. R. (1988). Amino acid sequence of the ribonuclease inhibitorfrom porcine liver reveals the presence of leucine-rich repeats.Biochemistry 27, 8537-8544.

[0357] Hopp, T. P., Prickett, K. S., Price, V. L., Libby, R. T., March,C. J., Cerretti, D. P., Urdal, D. L. and Conlon, P. J. (1988). A shortpolypeptide marker sequence useful for recombinant proteinidentification and purification. Bio/Technology 6, 1204-1210.

[0358] Horwitz, A. H., Chang, C. P., Better, M., Hellstrom, K. E. andRobinson, R. R. (1988). Secretion of functional antibody and Fabfragment from yeast cells. Proc Natl Acad Sci USA 85, 8678-8682.

[0359] Huxford, T., Huang, D. B., Malek, S., and Ghosh, G. (1998). Thecrystal structure of the IκBα/NF-κB complex reveals mechanisms of NF-κBinactivation. Cell 95, 759-770.

[0360] Inoue, H., Nojima, H., and Okayama, H. (1990). High efficiencytransformation of Escherichia coli with plasmids. Gene 96, 23-28.

[0361] Jacobs, M. D. and Harrison, S. C. (1998). Structure of anIκBα/NF-κB complex. Cell 95, 749-758.

[0362] Jeffrey, P. D., Tong, L., and Pavletich, N. P. (2000). Structuralbasis of inhibition of CDK-cyclin complexes by INK4 inhibitors. GenesDev 14, 3115-3125.

[0363] Jensen, R. B., Grohmann, E., Schwab, H., Diaz-Orejas, R., andGerdes, K. (1995). Comparison of ccd of F, parDE of RP4, and parD of R1using a novel conditional replication control system. Mol Microbiol 17,211-220.

[0364] Jucovic, M. and Hartley, R. W. (1996). Protein-proteininteraction: a genetic selection for compensating mutations at thebarnase-barstar interface. Proc Natl Acad Sci USA 93, 2343-2347.

[0365] Kajava, A. V. (1998). Structural diversity of leucine-rich repeatproteins. J Mol Biol 277, 519-527.

[0366] Kay, B. K., Winter, J. and McCafferty, J., eds. (1996). Phagedisplay of peptides and proteins: a laboratory manual. Academic Press,Inc., San Diego.

[0367] Kawanomoto, M., Motojima, K., Sasaki, M., Hattori, H., and Goto,S. (1992). cDNA cloning and sequence of rat ribonuclease inhibitor, andtissue distribution of the mRNA. Biochim Biophys Acta 1129, 335-338.

[0368] Kirkham, P. M., Neri, D., and Winter, G. (1999). Towards thedesign of an antibody that recognises a given protein epitope. J MolBiol 285, 909-915.

[0369] Knappik, A. and Plückthun, A. (1994). An improved affinity tagbased on the FLAG peptide for detection and purification of recombinantantibody fragments. Bio Techniques 17, 754-761.

[0370] Kobe, B., and Deisenhofer, J. (1993). Crystal structure ofporcine ribonuclease inhibitor, a protein with leucine-rich repeats.Nature 366, 751-756.

[0371] Kobe, B. and Deisenhofer, J. (1994). The leucine-rich repeat: aversatile binding motif. Trends Biochem.Sci. 19, 415-421.

[0372] Kobe, B., and Deisenhofer, J. (1995). A structural basis of theinteractions between leucine-rich repeats and protein ligands. Nature374, 183-186.

[0373] Kobe, B. (1996). Leucines on a roll. Nat Struct Biol 3, 977-980.

[0374] Kobe, B., and Deisenhofer, J. (1996). Mechanism of ribonucleaseinhibition by ribonuclease inhibitor protein based on the crystalstructure of its complex with ribonuclease A. J Mol Biol 264, 1028-1043.

[0375] Kobe, B. and Kajava, A. V. (2000). When protein folding issimplified to protein coiling: the continuum of solenoid proteinstructures. Trends Biochem. Sci. 25, 509-515.

[0376] Koradi, R., Billeter, M., and Wüthrich, K. (1996). MOLMOL: aprogram for display and analysis of macromolecular structures. J MolGraph 14, 51-55, 29-32.

[0377] Ku, J., and Schultz, P. G. (1995). Alternate protein frameworksfor molecular recognition. Proc Natl Acad Sci USA 92, 6552-6556.

[0378] Lee, F. S., and Vallee, B. L. (1989). Expression of humanplacental ribonuclease inhibitor in Escherichia coli. Biochem BiophysRes Commun 160,115-120.

[0379] Lee, F. S., Auld, D. S., and Vallee, B. L. (1989). Tryptophanfluorescence as a probe of placental ribonuclease inhibitor binding toangiogenin. Biochemistry 28, 219-224.

[0380] Lee, F. S., Fox, E. A., Zhou H. M., Strydorn, D. J., and Vallee,B. L. (1988). Primary structure of human placental ribonucleaseinhibitor [published erratum appears in Biochemistry 1989 Aug22;28(17):7138]. Biochemistry 27, 8545-8553.

[0381] Lindner, P., Guth, B., Wülfing, C., Krebber, C., Steipe, B.,Müller, F. and Plückthun, A. (1992). Purification of native proteinsfrom the cytoplasm and periplasm of Escherichia coli using IMAC andhistidine tails: a comparison of proteins and protocols. Methods: ACompanion to Methods Enzymol. 4, 41-56.

[0382] Lutz, R. and Bujard, H. (1997). Independent and tight regulationof transcriptional units in Escherichia coli via the LacR/O, the TetR/O,and AraC/I₁-I₂ regulatory elements. Nucleic Acids Res 25, 1203-1210.

[0383] Lux, S. E., John, K. M., and Bennett, V. (1990). Analysis of cDNAfor human erythrocyte ankyrin indicates a repeated structure withhomology to tissue-differentiation and cell-cycle control proteins.Nature 344, 3642.

[0384] Malek, S., Huxford, T., and Ghosh, G. (1998). Ikappa Balphafunctions through direct contacts with the nuclear localization signalsand the DNA binding sequences of NF-kappaB. J Biol Chem 273,25427-25435.

[0385] Marino, M., Braun, L., Cossart, P., and Ghosh, P. (1999).Structure of the InIB leucine-rich repeats, a domain that triggers hostcell invasion by the bacterial pathogen L. monocytogenes. Mol Cell 4,1063-1072.

[0386] Nygren, P. A., and Uhlen, M. (1997). Scaffolds for engineeringnovel binding sites in proteins. Curr Opin Struct Biol 7, 463-469.

[0387] Marino, M., Braun, L., Cossart, P., and Ghosh, P. (2000). Aframework for interpreting the leucine-rich repeats of the Listeriainternalins. Proc Nati Acad Sci USA 97,8784-8788.

[0388] Nyyssönen, E., Penttila, M., Harkki, A., Saloheimo, A., Knowles,J. K. and Keranen, S. (1993). Efficient production of antibody fragmentsby the filamentous fungus Trichoderma reesei. Bio/Technology 11,591-595.

[0389] O'Neil, K. T., and DeGrado, W. F. (1990). A thermodynamic scalefor the helix-forming tendencies of the commonly occurring amino acids.Science 250, 646-651.

[0390] Papageorgiou, A. C., Shapiro, R., and Acharya, K. R. (1997).Molecular recognition of human angiogenin by placental ribonucleaseinhibitor—an X-ray crystallographic study at 2.0 A resolution. EMBO J16, 5162-5177.

[0391] Pelletier, J. N., Campbell-Valois, F. X., and Michnick, S. W.(1998). Oligomerization domain-directed reassembly of activedihydrofolate reductase from rationally designed fragments. Proc NatlAcad Sci USA 95, 12141-12146.

[0392] Potter, K. N., Li, Y. and Capra, J. D. (1993). Antibodyproduction in the baculovirus expression system. Int Rev Immunol 10,103-112.

[0393] Price, S. R., Evans, P. R., and Nagai, K. (1998). Crystalstructure of the spliceosomal U2B″-U2A′ protein complex bound to afragment of U2 small nuclear RNA. Nature 394, 645-650.

[0394] Proba, K., Honegger, A., and Plückthun, A. (1997). A naturalantibody missing a cysteine in VH: consequences for thermodynamicstability and folding. J Mol Biol 265, 161-172.

[0395] Ridder, R., Schmitz, R., Legay, F. and Gram, H. (1 995).Generation of rabbit monoclonal antibody fragments from a combinatorialphage display library and their production in the yeast Pichia pastoris.Bio/Technology 13, 255-260.

[0396] Rogers, S., Wells, R., and Rechsteiner, M. (1986). Amino acidsequences common to rapidly degraded proteins: the PEST hypothesis.Science 234, 364-368.

[0397] Rost, B. (1996). PHD: predicting one-dimensional proteinstructure by profile-based neural networks. Methods Enzymol 266, 525-39.

[0398] Sambrook, J., Fritsch, E. F. and Maniatis, T. (1989). MolecularCloning: A laboratory manual, Cold Spring Harbor Laboratory Press, ColdSpring Harbor, USA.

[0399] Schmidt, T. G. and Skerra, A. (1993). The random peptidelibrary-assisted engineering of a C-terminal affinity peptide, usefulfor the detection and purification of a functional Ig Fv fragment.Protein Eng 6, 109-122.

[0400] Schmidt, T. G. and Skerra, A. (1994). One-step affinitypurification of bacterially produced proteins by means of the “Streptag” and immobilised recombinant core streptavidin. J Chromatogr A 676,337-345.

[0401] Schmidt, T. G., Koepke, J., Frank, R., and Skerra, A. (1996).Molecular interaction between the Strep-tag affinity peptide and itscognate target, streptavidin. J Mol Biol 255, 753-766.

[0402] Schultz, J., Copley, R. R., Doerks, T., Ponting, C. P., and Bork,P. (2000). SMART: a web-based tool for the study of genetically mobiledomains. Nucleic Acids Res 28, 231-234.

[0403] Sedgwick, S. G. and Smerdon, S. J. (1999). The ankyrin repeat: adiversity of interactions on a common structural framework. TrendsBiochem Sci 24, 311-316.

[0404] Sequeira, E., McEntyre, J., and Lipman, D. (2001). PubMed Centraldecentralized. Nature 410, 740.

[0405] Sidhu, S. S., Lowman, H. B., and Wells, J. A. (2000). Phagedisplay for selection of novel binding peptides. Methods Enzymol, in thepress.

[0406] Smith G. P. (1985). Filamentous fusion phage: novel expressionvectors that display cloned antigens on the virion surface. Science 228,1315-1317.

[0407] Stemmer, W. P. (1994). DNA shuffling by random fragmentation andreassembly: in vitro recombination for molecular evolution. Proc NatlAcad Sci USA 91, 10747-10751.

[0408] Suzuki, F., Goto, M., Sawa, C., Ito, S., Watanabe, H., Sawada,J., and Handa, H. (1998). Functional interactions of transcriptionfactor human GA-binding protein subunits. J Biol Chem 273, 29302-29308.

[0409] Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994).CLUSTAL W: improving the sensitivity of progressive multiple sequencealignment through sequence weighting, position-specific gap penaltiesand weight matrix choice. Nucleic Acids Res 22, 4673-4680.

[0410] Trill, J. J., Shatzman, A. R. and Ganguly, S. (1995). Productionof monoclonal antibodies in COS and CHO cells. Curr Opin Biotechnol 6,553-560.

[0411] Venkataramani, R., Swaminathan, K., and Marmorstein, R. (1998).Crystal structure of the CDK4/6 inhibitory protein p18INK4c providesinsights into ankyrin-like repeat structure/function and tumor-derivedp16INK4 mutations. Nat Struct Biol 5, 74-81.

[0412] Virnekäs, B., Ge, L., Plückthun, A., Schneider, K. C.,Welinhofer, G., and Moroney, S. E. (1994). Trinucleotidephosphoramidites: ideal reagents for the synthesis of mixedoligonucleotides for random mutagenesis. Nucleic Acids Res 22,5600-5607.

[0413] Volkov, A. A. and Arnold, F. H. (2000). Methods for in vitro DNArecombination and random chimeragenesis. Methods Enzymol 328, 447-456.

[0414] Waldo, G. S., Standish B. M., Berendzen, J., and Terwilliger, T.C. (1999). Rapid protein-folding assay using green fluorescent protein.Nat Biotechnol 17, 691-695.

[0415] Ward, V. K., Kreissig, S. B., Hammock, B. D. and Choudary, P. V.(1995). Generation of an expression library in the baculovirusexpression vector system. J Virol Methods 53, 263-272.

[0416] Whitelam, G. C., Cockburn, W. and Owen, M. R. (1994). Antibodyproduction in transgenic plants. Biochem Soc Trans 22, 940-944.

[0417] Wilson, D. S. and Keefe, A. D. (2000). Random Mutagenesis by PCR.In Current Protocols in Molecular Biology. F. M. Ausubel, R. Brent, R.E. Kingston, D. D. Moore, J. G. Seidman, J. A. Smith, and K. Strubel,eds. (New York: Wiley).

[0418] Womble, D. D. (2000). GCG: The Wisconsin Package of sequenceanalysis programs. Methods Mol Biol 132, 3-22.

[0419] Wu, X. C., Ng, S. C., Near, R. I. and Wong, S. L. (1993a).Efficient production of a functional single-chain antidigoxin antibodyvia an engineered Bacillus subtilis expression-secretion system.Bio/Technology 11, 71-76.

[0420] Wu, Y., Mikulski, S. M., Ardelt, W., Rybak, S. M., and Youle, R.J. (1993b). A cytotoxic ribonuclease. Study of the mechanism of onconasecytotoxicity. J Biol Chem 268, 10686-10693.

[0421] Yang, F., Forrer, P., Dauter, Z., Conway, J. F., Cheng, N.,Cerritelli, M. E., Steven, A. C., Plückthun, A., and Wlodawer, A.(2000). Novel fold and capsid-binding properties of the lambda-phagedisplay platform protein gpD. Nat Struct Biol 7, 230-237.

[0422] Yang, W. P., Green, K., Pinz-Sweeney, S., Briones, A. T., Burton,D. R., and Barbas, C. F., 3rd (1995). CDR walking mutagenesis for theaffinity maturation of a potent human anti-HIV-I antibody into thepicomolar range. J Mol Biol 254, 392-403.

[0423] Zhang, B., and Peng, Z. (2000). A minimum folding unit in theankyrin repeat protein p16(INK4). J Mol Biol 299, 1121-1132.

1 77 1 48 PRT Artificial Sequence Description of Artificial SequenceSynthetic ankyrin repeat consensus sequence 1 Asp Xaa Xaa Gly Xaa ThrPro Leu His Leu Ala Xaa Xaa Xaa Xaa Xaa 1 5 10 15 Xaa Xaa Xaa Xaa XaaXaa Xaa Xaa Gly Xaa Xaa Xaa Xaa Val Xaa Xaa 20 25 30 Leu Leu Xaa Xaa GlyAla Xaa Xaa Xaa Xaa Xaa Asp Val Asn Ala Xaa 35 40 45 2 33 PRT ArtificialSequence Description of Artificial Sequence Synthetic ankyrin repeatconsensus sequence 2 Asp Xaa Xaa Gly Xaa Thr Pro Leu His Leu Ala Xaa XaaXaa Gly Xaa 1 5 10 15 Xaa Xaa Val Val Xaa Leu Leu Leu Xaa Xaa Gly AlaAsp Val Asn Ala 20 25 30 Xaa 3 33 PRT Artificial Sequence Description ofArtificial Sequence Synthetic ankyrin repeat sequence motif 3 Asp XaaXaa Gly Xaa Thr Pro Leu His Leu Ala Xaa Xaa Xaa Gly Xaa 1 5 10 15 XaaXaa Ile Val Xaa Val Leu Leu Xaa Xaa Gly Ala Asp Val Asn Ala 20 25 30 Xaa4 33 PRT Artificial Sequence Description of Artificial SequenceSynthetic ankyrin repeat sequence motif 4 Asp Xaa Xaa Gly Xaa Thr ProLeu His Leu Ala Ala Xaa Xaa Gly His 1 5 10 15 Leu Glu Ile Val Glu ValLeu Leu Lys Xaa Gly Ala Asp Val Asn Ala 20 25 30 Xaa 5 30 PRT ArtificialSequence Description of Artificial Sequence Synthetic LRR consensussequence 5 Xaa Leu Xaa Xaa Leu Xaa Leu Xaa Xaa Asn Xaa Xaa Xaa Xaa XaaXaa 1 5 10 15 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 2025 30 6 28 PRT Artificial Sequence Description of Artificial SequenceSynthetic LRR consensus sequence 6 Xaa Leu Glu Xaa Leu Xaa Leu Xaa XaaCys Xaa Leu Thr Xaa Xaa Xaa 1 5 10 15 Cys Xaa Xaa Leu Xaa Xaa Xaa LeuXaa Xaa Xaa Xaa 20 25 7 29 PRT Artificial Sequence Description ofArtificial Sequence Synthetic LRR consensus sequence 7 Xaa Leu Xaa GluLeu Xaa Leu Xaa Xaa Asn Xaa Leu Gly Asp Xaa Gly 1 5 10 15 Xaa Xaa XaaLeu Xaa Xaa Xaa Leu Xaa Xaa Pro Xaa Xaa 20 25 8 57 PRT ArtificialSequence Description of Artificial Sequence Synthetic peptide construct8 Arg Leu Glu Xaa Leu Xaa Leu Xaa Xaa Xaa Asp Leu Thr Glu Ala Gly 1 5 1015 Xaa Lys Asp Leu Ala Ser Val Leu Arg Ser Asn Pro Ser Leu Arg Glu 20 2530 Leu Xaa Leu Ser Xaa Asn Lys Leu Gly Asp Ala Gly Val Arg Leu Leu 35 4045 Leu Gln Gly Leu Leu Asp Pro Gly Thr 50 55 9 171 DNA ArtificialSequence Description of Artificial Sequence Synthetic nucleic acidconstruct 9 cgcctggagn nnctgnnnct gnnnnnnnnn gacctcaccg aggccggcnnnaaggacctg 60 gccagcgtgc tccgctccaa cccgagcctg cgggagctgn nnctgagcnnnaacaagctc 120 ggcgatgcag gcgtgcggct gctcttgcag gggctgctgg accccggcac g171 10 171 DNA Artificial Sequence Description of Artificial SequenceSynthetic nucleic acid construct 10 cgcctggagn nnctgnnnct gnnnnnnnnngacctcaccg aggccggcnn naaggacctg 60 gccagcgtgc tccgctccaa cccgagcctgcgggagctgn nnctgagcnn naacaagctc 120 ggcgatgcag gcgtgcggct gctcttgcaggggctgctgg accccggcac g 171 11 28 PRT Artificial Sequence Description ofArtificial Sequence Synthetic peptide construct 11 Xaa Leu Glu Xaa LeuXaa Leu Xaa Xaa Cys Xaa Leu Thr Xaa Ala Xaa 1 5 10 15 Cys Xaa Xaa LeuXaa Ser Val Leu Xaa Xaa Xaa Xaa 20 25 12 29 PRT Artificial SequenceDescription of Artificial Sequence Synthetic peptide construct 12 SerLeu Xaa Glu Leu Xaa Leu Ser Xaa Asn Xaa Leu Gly Asp Xaa Gly 1 5 10 15Xaa Xaa Xaa Leu Cys Xaa Gly Leu Xaa Xaa Pro Xaa Cys 20 25 13 28 PRTArtificial Sequence Description of Artificial Sequence Synthetic peptideconstruct 13 Xaa Leu Glu Xaa Leu Xaa Leu Xaa Xaa Cys Xaa Leu Thr Ala AlaXaa 1 5 10 15 Cys Xaa Asp Leu Xaa Ser Val Leu Arg Ala Asn Xaa 20 25 1429 PRT Artificial Sequence Description of Artificial Sequence Syntheticpeptide construct 14 Ser Leu Xaa Glu Leu Xaa Leu Ser Xaa Asn Xaa Leu GlyAsp Ala Gly 1 5 10 15 Xaa Xaa Xaa Leu Cys Xaa Gly Leu Xaa Xaa Pro XaaCys 20 25 15 28 PRT Artificial Sequence Description of ArtificialSequence Synthetic peptide construct 15 Xaa Leu Glu Xaa Leu Trp Leu XaaAsp Cys Gly Leu Thr Ala Ala Gly 1 5 10 15 Cys Lys Asp Leu Cys Ser ValLeu Arg Ala Asn Xaa 20 25 16 29 PRT Artificial Sequence Description ofArtificial Sequence Synthetic peptide construct 16 Ser Leu Arg Glu LeuXaa Leu Ser Xaa Asn Xaa Leu Gly Asp Ala Gly 1 5 10 15 Val Xaa Leu LeuCys Glu Gly Leu Leu Xaa Pro Xaa Cys 20 25 17 28 PRT Artificial SequenceDescription of Artificial Sequence Synthetic peptide construct 17 XaaLeu Glu Lys Leu Trp Leu Glu Asp Cys Gly Leu Thr Ala Ala Gly 1 5 10 15Cys Lys Asp Leu Cys Ser Val Leu Arg Ala Asn Pro 20 25 18 29 PRTArtificial Sequence Description of Artificial Sequence Synthetic peptideconstruct 18 Ser Leu Arg Glu Leu Asp Leu Ser Xaa Asn Glu Leu Gly Asp AlaGly 1 5 10 15 Val Arg Leu Leu Cys Glu Gly Leu Leu Xaa Pro Gly Cys 20 2519 28 PRT Artificial Sequence Description of Artificial SequenceSynthetic peptide construct 19 Arg Leu Glu Lys Leu Trp Leu Glu Asp XaaGly Leu Thr Ala Ala Gly 1 5 10 15 Xaa Lys Asp Leu Ala Ser Val Leu ArgAla Asn Pro 20 25 20 29 PRT Artificial Sequence Description ofArtificial Sequence Synthetic peptide construct 20 Ser Leu Arg Glu LeuAsp Leu Ser Xaa Asn Glu Leu Gly Asp Ala Gly 1 5 10 15 Val Arg Leu LeuLeu Glu Gly Leu Leu Xaa Pro Gly Thr 20 25 21 28 PRT Artificial SequenceDescription of Artificial Sequence Synthetic peptide construct 21 ArgLeu Glu Xaa Leu Xaa Leu Xaa Xaa Xaa Gly Leu Thr Ala Ala Gly 1 5 10 15Xaa Lys Asp Leu Ala Ser Val Leu Arg Ala Asn Pro 20 25 22 29 PRTArtificial Sequence Description of Artificial Sequence Synthetic peptideconstruct 22 Ser Leu Arg Glu Leu Xaa Leu Ser Xaa Asn Glu Leu Gly Asp AlaGly 1 5 10 15 Val Arg Leu Leu Leu Glu Gly Leu Leu Xaa Pro Gly Thr 20 2523 28 PRT Artificial Sequence Description of Artificial SequenceSynthetic peptide construct 23 Arg Leu Glu Xaa Leu Xaa Leu Xaa Xaa XaaAsp Leu Thr Glu Ala Gly 1 5 10 15 Xaa Lys Asp Leu Ala Ser Val Leu ArgSer Asn Pro 20 25 24 29 PRT Artificial Sequence Description ofArtificial Sequence Synthetic peptide construct 24 Ser Leu Arg Glu LeuXaa Leu Ser Xaa Asn Lys Leu Gly Asp Ala Gly 1 5 10 15 Val Arg Leu LeuLeu Gln Gly Leu Leu Asp Pro Gly Thr 20 25 25 12 PRT Artificial SequenceDescription of Artificial Sequence Synthetic amino acid linker 25 GlySer Ala Gly Ser Ala Ala Gly Ser Gly Glu Phe 1 5 10 26 57 DNA ArtificialSequence Description of Artificial Sequence Synthetic oligonucleotide 26catgccatgg actacaagga tcatcaccat caccatcacg gatccctgga catccag 57 27 47DNA Artificial Sequence Description of Artificial Sequence Syntheticoligonucleotide 27 gcataagctt atcactcgag gcgcgcgtag ggctgctgga gcagagg47 28 25 DNA Artificial Sequence Description of Artificial SequenceSynthetic oligonucleotide 28 gcataagctt atcaggagat gaccc 25 29 32 DNAArtificial Sequence Description of Artificial Sequence Syntheticoligonucleotide 29 catgccatgg gcgcgcctcg agcagctggt cc 32 30 54 DNAArtificial Sequence Description of Artificial Sequence Syntheticoligonucleotide 30 ttggcgcgcc tggagnnnct gnnnctgnnn nnnnnngacctcaccgaggc cggc 54 31 63 DNA Artificial Sequence Description ofArtificial Sequence Synthetic oligonucleotide 31 ccgcaggctc gggttggagcggagcacgct ggccaggtcc ttcangccgg cctcggtgag 60 gtc 63 32 54 DNAArtificial Sequence Description of Artificial Sequence Syntheticoligonucleotide 32 tccaacccga gcctgcggga gctgnnnctg agcnnnaacaagctcggcga tgca 54 33 72 DNA Artificial Sequence Description ofArtificial Sequence Synthetic oligonucleotide 33 ccgctcgaga cgcgtgccggggtccagcag cccctgcaag agcagccgca cgcctgcatc 60 gccgagcttg tt 72 34 35DNA Artificial Sequence Description of Artificial Sequence Syntheticoligonucleotide 34 taatacgact cactataggg ttggcgcgcc tggag 35 35 42 DNAArtificial Sequence Description of Artificial Sequence Syntheticoligonucleotide 35 ggctttgtta gcagccggat cctcgagacg cgtgccgggg tc 42 3624 DNA Artificial Sequence Description of Artificial Sequence Syntheticoligonucleotide 36 aaattaatac gactcactat aggg 24 37 20 DNA ArtificialSequence Description of Artificial Sequence Synthetic oligonucleotide 37cgggctttgt tagcagccgg 20 38 49 DNA Artificial Sequence Description ofArtificial Sequence Synthetic oligonucleotide 38 ctgacgttaa cgctnnngacnnnnnnggtn nnactccgct gcacctggc 49 39 43 DNA Artificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 39actccgctgc acctggctgc tnnnnnnggt cacctggaaa tcg 43 40 48 DNA ArtificialSequence Description of Artificial Sequence Synthetic oligonucleotide 40aacgtcagca ccgtncttca gcagaacttc aacgatttcc aggtgacc 48 41 21 DNAArtificial Sequence Description of Artificial Sequence Syntheticoligonucleotide 41 agcagccagg tgcagcggag t 21 42 35 DNA ArtificialSequence Description of Artificial Sequence Synthetic oligonucleotide 42ttccgcggat cctaggaaga cctgacgtta acgct 35 43 35 DNA Artificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 43tttgggaagc ttctaaggtc tcacgtcagc accgt 35 44 22 DNA Artificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 44tttgggaagc ttctaaggtc tc 22 45 35 DNA Artificial Sequence Description ofArtificial Sequence Synthetic oligonucleotide 45 tttgggaagc ttctagaagacaacgtcagc accgt 35 46 64 DNA Artificial Sequence Description ofArtificial Sequence Synthetic oligonucleotide 46 ttccgcggat ccgacctgggtaagaaactg ctggaagctg ctcgtgctgg tcaggacgac 60 gaag 64 47 48 DNAArtificial Sequence Description of Artificial Sequence Syntheticoligonucleotide 47 aacgtcagca ccgttagcca tcaggatacg aacttcgtcg tcctgacc48 48 20 DNA Artificial Sequence Description of Artificial SequenceSynthetic oligonucleotide 48 ttccgcggat ccgacctggg 20 49 13 DNAArtificial Sequence Description of Artificial Sequence Syntheticoligonucleotide 49 ttccgcggat ccg 13 50 69 DNA Artificial SequenceDescription of Artificial Sequence Synthetic oligonucleotide 50ctgacgttaa cgctcaggac aaattcggta agaccgcttt cgacatctcc atcgacaacg 60gtaacgagg 69 51 36 DNA Artificial Sequence Description of ArtificialSequence Synthetic oligonucleotide 51 ttgcaggatt tcagccaggt cctcgttaccgttgtc 36 52 30 DNA Artificial Sequence Description of ArtificialSequence Synthetic oligonucleotide 52 tttgggaagc ttctattgca ggatttcagc30 53 460 PRT Homo sapiens 53 Ser Leu Asp Ile Gln Ser Leu Asp Ile GlnCys Glu Glu Leu Ser Asp 1 5 10 15 Ala Arg Trp Ala Glu Leu Leu Pro LeuLeu Gln Gln Cys Gln Val Val 20 25 30 Arg Leu Asp Asp Cys Gly Leu Thr GluAla Arg Cys Lys Asp Ile Ser 35 40 45 Ser Ala Leu Arg Val Asn Pro Ala LeuAla Glu Leu Asn Leu Arg Ser 50 55 60 Asn Glu Leu Gly Asp Val Gly Val HisCys Val Leu Gln Gly Leu Gln 65 70 75 80 Thr Pro Ser Cys Lys Ile Gln LysLeu Ser Leu Gln Asn Cys Cys Leu 85 90 95 Thr Gly Ala Gly Cys Gly Val LeuSer Ser Thr Leu Arg Thr Leu Pro 100 105 110 Thr Leu Gln Glu Leu His LeuSer Asp Asn Leu Leu Gly Asp Ala Gly 115 120 125 Leu Gln Leu Leu Cys GluGly Leu Leu Asp Pro Gln Cys Arg Leu Glu 130 135 140 Lys Leu Gln Leu GluTyr Cys Ser Leu Ser Ala Ala Ser Cys Glu Pro 145 150 155 160 Leu Ala SerVal Leu Arg Ala Lys Pro Asp Phe Lys Glu Leu Thr Val 165 170 175 Ser AsnAsn Asp Ile Asn Glu Ala Gly Val Arg Val Leu Cys Gln Gly 180 185 190 LeuLys Asp Ser Pro Cys Gln Leu Glu Ala Leu Lys Leu Glu Ser Cys 195 200 205Gly Val Thr Ser Asp Asn Cys Arg Asp Leu Cys Gly Ile Val Ala Ser 210 215220 Lys Ala Ser Leu Arg Glu Leu Ala Leu Gly Ser Asn Lys Leu Gly Asp 225230 235 240 Val Gly Met Ala Glu Leu Cys Pro Gly Leu Leu His Pro Ser SerArg 245 250 255 Leu Arg Thr Leu Trp Ile Trp Glu Cys Gly Ile Thr Ala LysGly Cys 260 265 270 Gly Asp Leu Cys Arg Val Leu Arg Ala Lys Glu Ser LeuLys Glu Leu 275 280 285 Ser Leu Ala Gly Asn Glu Leu Gly Asp Glu Gly AlaArg Leu Leu Cys 290 295 300 Glu Thr Leu Leu Glu Pro Gly Cys Gln Leu GluSer Leu Trp Val Lys 305 310 315 320 Ser Cys Ser Phe Thr Ala Ala Cys CysSer His Phe Ser Ser Val Leu 325 330 335 Ala Gln Asn Arg Phe Leu Leu GluLeu Gln Ile Ser Asn Asn Arg Leu 340 345 350 Glu Asp Ala Gly Val Arg GluLeu Cys Gln Gly Leu Gly Gln Pro Gly 355 360 365 Ser Val Leu Arg Val LeuTrp Leu Ala Asp Cys Asp Val Ser Asp Ser 370 375 380 Ser Cys Ser Ser LeuAla Ala Thr Leu Leu Ala Asn His Ser Leu Arg 385 390 395 400 Glu Leu AspLeu Ser Asn Asn Cys Leu Gly Asp Ala Gly Ile Leu Gln 405 410 415 Leu ValGlu Ser Val Arg Gln Pro Gly Cys Leu Leu Glu Gln Leu Val 420 425 430 LeuTyr Asp Ile Tyr Trp Ser Glu Glu Met Glu Asp Arg Leu Gln Ala 435 440 445Leu Glu Lys Asp Lys Pro Ser Leu Arg Val Ile Ser 450 455 460 54 225 PRTHomo sapiens 54 Met Ser Leu Gln Val Val Arg Leu Asp Asp Cys Gly Leu ThrGlu Ala 1 5 10 15 Arg Cys Lys Asp Ile Ser Ser Ala Leu Arg Val Asn ProLys Ile Gln 20 25 30 Lys Leu Ser Leu Gln Asn Cys Cys Leu Thr Gly Ala GlyCys Gly Val 35 40 45 Leu Ser Ser Thr Leu Arg Thr Leu Pro Arg Leu Glu LysLeu Gln Leu 50 55 60 Glu Tyr Cys Ser Leu Ser Ala Ala Ser Cys Glu Pro LeuAla Ser Val 65 70 75 80 Leu Arg Ala Lys Pro Gln Leu Glu Ala Leu Lys LeuGlu Ser Cys Gly 85 90 95 Val Thr Ser Asp Asn Cys Arg Asp Leu Cys Gly IleVal Ala Ser Lys 100 105 110 Ala Arg Leu Arg Thr Leu Trp Ile Trp Glu CysGly Ile Thr Ala Lys 115 120 125 Gly Cys Gly Asp Leu Cys Arg Val Leu ArgAla Lys Glu Gln Leu Glu 130 135 140 Ser Leu Trp Val Lys Ser Cys Ser PheThr Ala Ala Cys Cys Ser His 145 150 155 160 Phe Ser Ser Val Leu Ala GlnAsn Arg Val Leu Arg Val Leu Trp Leu 165 170 175 Ala Asp Cys Asp Val SerAsp Ser Ser Cys Ser Ser Leu Ala Ala Thr 180 185 190 Leu Leu Ala Asn HisLeu Leu Glu Gln Leu Val Leu Tyr Asp Ile Tyr 195 200 205 Trp Ser Glu GluMet Glu Asp Arg Leu Gln Ala Leu Glu Lys Asp Lys 210 215 220 Pro 225 55222 PRT Sus scrofa 55 Glu Val Val Arg Leu Asp Asp Cys Gly Leu Thr GluGlu His Cys Lys 1 5 10 15 Asp Ile Gly Ser Ala Leu Arg Ala Asn Pro LysIle Gln Lys Leu Ser 20 25 30 Leu Gln Asn Cys Ser Leu Thr Glu Ala Gly CysGly Val Leu Pro Ser 35 40 45 Thr Leu Arg Ser Leu Pro His Leu Glu Lys LeuGln Leu Glu Tyr Cys 50 55 60 Arg Leu Thr Ala Ala Ser Cys Glu Pro Leu AlaSer Val Leu Arg Ala 65 70 75 80 Thr Arg Gln Leu Glu Thr Leu Arg Leu GluAsn Cys Gly Leu Thr Pro 85 90 95 Ala Asn Cys Lys Asp Leu Cys Gly Ile ValAla Ser Gln Ala Arg Leu 100 105 110 Lys Thr Leu Trp Leu Trp Glu Cys AspIle Thr Ala Ser Gly Cys Arg 115 120 125 Asp Leu Cys Arg Val Leu Gln AlaLys Glu Gln Leu Glu Ser Leu Trp 130 135 140 Val Lys Ser Cys Ser Leu ThrAla Ala Cys Cys Gln His Val Ser Leu 145 150 155 160 Met Leu Thr Gln AsnLys Thr Leu Arg Val Leu Cys Leu Gly Asp Cys 165 170 175 Glu Val Thr AsnSer Gly Cys Ser Ser Leu Ala Ser Leu Leu Leu Ala 180 185 190 Asn Arg AlaLeu Glu Gln Leu Val Leu Tyr Asp Thr Tyr Trp Thr Glu 195 200 205 Glu ValGlu Asp Arg Leu Gln Ala Leu Glu Gly Ser Lys Pro 210 215 220 56 221 PRTRattus sp. 56 Gln Val Val Arg Leu Asp Asp Cys Gly Leu Thr Glu Val ArgCys Lys 1 5 10 15 Asp Ile Arg Ser Ala Ile Gln Ala Asn Pro Lys Ile GlnLys Leu Ser 20 25 30 Leu Gln Asn Cys Ser Leu Thr Glu Ala Gly Cys Gly ValLeu Pro Asp 35 40 45 Val Leu Arg Ser Leu Ser Leu Glu Lys Leu Gln Leu GluTyr Cys Asn 50 55 60 Leu Thr Ala Thr Ser Cys Glu Pro Leu Ala Ser Val LeuArg Val Lys 65 70 75 80 Pro Gln Leu Glu Ser Leu Lys Leu Glu Asn Cys GlyIle Thr Ser Ala 85 90 95 Asn Cys Lys Asp Leu Cys Asp Val Val Ala Ser LysAla Arg Leu Arg 100 105 110 Thr Leu Trp Leu Trp Asp Cys Asp Val Thr AlaGlu Gly Cys Lys Asp 115 120 125 Leu Cys Arg Val Leu Arg Ala Lys Gln GlnLeu Glu Ser Leu Trp Val 130 135 140 Lys Thr Cys Ser Leu Thr Ala Ala SerCys Pro His Phe Cys Ser Val 145 150 155 160 Leu Thr Lys Asn Ser Val LeuArg Val Leu Trp Leu Gly Asp Cys Asp 165 170 175 Val Thr Asp Ser Gly CysSer Ser Leu Ala Thr Val Leu Leu Ala Asn 180 185 190 Arg Ile Leu Gln GlnLeu Val Leu Tyr Asp Ile Tyr Trp Thr Asp Glu 195 200 205 Val Glu Asp GlnLeu Arg Ala Leu Glu Glu Glu Arg Pro 210 215 220 57 222 PRT Mus musculus57 Glu Val Val Arg Leu Asp Asp Cys Gly Leu Thr Glu Val Arg Cys Lys 1 510 15 Asp Ile Ser Ser Ala Val Gln Ala Asn Pro Lys Ile Gln Lys Leu Ser 2025 30 Leu Gln Asn Cys Gly Leu Thr Glu Ala Gly Cys Gly Ile Leu Pro Gly 3540 45 Met Leu Arg Ser Leu Ser Arg Leu Glu Lys Leu Gln Leu Glu Tyr Cys 5055 60 Asn Leu Thr Ala Thr Ser Cys Glu Pro Leu Ala Ser Val Leu Arg Val 6570 75 80 Lys Ala Gln Leu Glu Ser Leu Lys Leu Glu Asn Cys Gly Ile Thr Ala85 90 95 Ala Asn Cys Lys Asp Leu Cys Asp Val Val Ala Ser Lys Ala Lys Leu100 105 110 Gly Thr Leu Trp Leu Trp Glu Cys Asp Ile Thr Ala Glu Gly CysLys 115 120 125 Asp Leu Cys Arg Val Leu Arg Ala Asn Gln Gln Leu Glu SerLeu Trp 130 135 140 Ile Lys Thr Cys Ser Leu Thr Ala Ala Ser Cys Pro TyrPhe Cys Ser 145 150 155 160 Val Leu Thr Lys Ser Arg Val Leu Arg Glu LeuTrp Leu Gly Asp Cys 165 170 175 Asp Val Thr Asn Ser Gly Cys Ser Ser LeuAla Asn Val Leu Leu Ala 180 185 190 Asn Arg Thr Leu Gln Gln Leu Val LeuTyr Asp Ile Tyr Trp Thr Asn 195 200 205 Glu Val Glu Glu Gln Leu Arg AlaLeu Glu Glu Gly Arg Pro 210 215 220 58 234 PRT Homo sapiens 58 Asp IleGln Ser Leu Asp Ile Gln Cys Glu Glu Leu Ser Asp Ala Arg 1 5 10 15 TrpAla Glu Leu Leu Pro Leu Leu Gln Gln Cys Ala Leu Ala Glu Leu 20 25 30 AsnLeu Arg Ser Asn Glu Leu Gly Asp Val Gly Val His Cys Val Leu 35 40 45 GlnGly Leu Gln Thr Pro Ser Cys Thr Leu Gln Glu Leu His Leu Ser 50 55 60 AspAsn Leu Leu Gly Asp Ala Leu Gln Leu Leu Cys Glu Gly Leu Leu 65 70 75 80Asp Pro Gln Cys Asp Phe Lys Glu Leu Thr Val Ser Asn Asn Asp Ile 85 90 95Asn Glu Ala Gly Val Arg Val Leu Cys Gln Gly Leu Lys Asp Ser Pro 100 105110 Cys Ser Leu Arg Glu Leu Ala Leu Gly Ser Asn Lys Leu Gly Asp Val 115120 125 Gly Met Ala Glu Leu Cys Pro Gly Leu His Pro Ser Ser Ser Leu Lys130 135 140 Glu Leu Ser Leu Ala Gly Asn Glu Leu Gly Asp Glu Gly Ala ArgLeu 145 150 155 160 Leu Cys Glu Thr Leu Leu Glu Pro Gly Cys Phe Leu LeuGlu Leu Gln 165 170 175 Ile Ser Asn Asn Arg Leu Glu Asp Ala Gly Val ArgGlu Leu Cys Gln 180 185 190 Gly Leu Gly Gln Pro Gly Ser Ser Leu Arg GluLeu Asp Leu Ser Asn 195 200 205 Asn Cys Leu Gly Asp Ala Gly Ile Leu GlnLeu Val Glu Ser Val Arg 210 215 220 Gln Pro Gly Cys Ser Leu Arg Val IleSer 225 230 59 234 PRT Sus scrofa 59 Met Asn Leu Asp Ile His Cys Glu GlnLeu Ser Asp Ala Arg Trp Thr 1 5 10 15 Glu Leu Leu Pro Leu Leu Gln GlnTyr Ser Leu Thr Glu Leu Cys Leu 20 25 30 Arg Thr Asn Glu Leu Gly Asp AlaGly Val His Leu Val Leu Gln Gly 35 40 45 Leu Gln Ser Pro Thr Cys Thr LeuArg Glu Leu His Leu Ser Asp Asn 50 55 60 Pro Leu Gly Asp Ala Gly Leu ArgLeu Leu Cys Glu Gly Leu Leu Asp 65 70 75 80 Pro Gln Cys Ala Leu Lys GluLeu Thr Val Ser Asn Asn Asp Ile Gly 85 90 95 Glu Ala Gly Ala Arg Val LeuGly Gln Gly Leu Ala Asp Ser Ala Cys 100 105 110 Ser Leu Arg Glu Leu AspLeu Gly Ser Asn Gly Leu Gly Asp Ala Gly 115 120 125 Ile Ala Glu Leu CysPro Gly Leu Leu Ser Pro Ala Ser Thr Leu Lys 130 135 140 Glu Leu Ser LeuAla Gly Asn Lys Leu Gly Asp Glu Gly Ala Arg Leu 145 150 155 160 Leu CysGlu Ser Leu Leu Gln Pro Gly Cys His Leu Leu Glu Leu Gln 165 170 175 LeuSer Ser Asn Lys Leu Gly Asp Ser Gly Ile Gln Glu Leu Cys Gln 180 185 190Ala Leu Ser Gln Pro Gly Thr Ser Leu Arg Glu Leu Asp Leu Ser Asn 195 200205 Asn Cys Val Gly Asp Pro Gly Val Leu Gln Leu Leu Gly Ser Leu Glu 210215 220 Gln Pro Gly Cys Gly Leu Arg Val Ile Ser 225 230 60 234 PRTRattus sp. 60 Met Ser Leu Asp Ile Gln Cys Glu Gln Leu Ser Asp Ala ArgTrp Thr 1 5 10 15 Glu Leu Leu Pro Leu Ile Gln Gln Tyr Ala Leu Thr GluLeu Ser Leu 20 25 30 Arg Thr Asn Glu Leu Gly Asp Ala Gly Val Gly Leu ValLeu Gln Gly 35 40 45 Leu Gln Asn Pro Thr Cys Thr Leu Arg Glu Leu His LeuAsn Asp Asn 50 55 60 Pro Leu Gly Asp Glu Gly Leu Lys Leu Leu Cys Glu GlyLeu Arg Asp 65 70 75 80 Pro Gln Cys Asp Phe Lys Glu Leu Val Leu Ser AsnAsn Asp Phe His 85 90 95 Glu Ala Gly Ile His Thr Leu Cys Gln Gly Leu LysAsp Ser Ala Cys 100 105 110 Ser Leu Gln Glu Leu Asp Leu Gly Ser Asn LysLeu Gly Asn Thr Gly 115 120 125 Ile Ala Ala Leu Cys Ser Gly Leu Leu LeuPro Ser Cys Ser Leu Lys 130 135 140 Glu Leu Ser Leu Ala Gly Asn Glu LeuLys Asp Glu Gly Ala Gln Leu 145 150 155 160 Leu Cys Glu Ser Leu Leu GluPro Gly Cys Ser Leu Phe Glu Leu Gln 165 170 175 Met Ser Ser Asn Pro LeuGly Asp Ser Gly Val Val Glu Leu Cys Lys 180 185 190 Ala Leu Gly Tyr ProAsp Thr Ser Leu Arg Glu Leu Asp Leu Ser Asn 195 200 205 Asn Cys Met GlyAsp Asn Gly Val Leu Gln Leu Leu Glu Ser Leu Lys 210 215 220 Gln Pro SerCys Ser Leu Arg Ile Ile Ser 225 230 61 233 PRT Mus musculus MOD_RES(168) Variable amino acid 61 Met Ser Leu Asp Ile Gln Cys Glu Gln Leu GlyAsp Ala Arg Trp Thr 1 5 10 15 Glu Leu Leu Pro Leu Ile Gln Gln Tyr AlaLeu Thr Glu Leu Ser Leu 20 25 30 Arg Thr Asn Glu Leu Gly Asp Gly Gly AlaGly Leu Val Leu Gln Gly 35 40 45 Leu Gln Asn Pro Thr Cys Thr Leu Arg GluLeu His Leu Asn Asp Asn 50 55 60 Pro Met Gly Asp Ala Gly Leu Lys Leu LeuCys Glu Gly Leu Gln Asp 65 70 75 80 Pro Gln Cys Asp Phe Lys Glu Leu ValSer Asn Asn Asp Leu His Glu 85 90 95 Pro Gly Val Arg Ile Leu Cys Gln GlyLeu Lys Asp Ser Ala Cys Ser 100 105 110 Leu Gln Glu Leu Asp Leu Ser SerAsn Lys Leu Gly Asn Ala Gly Ile 115 120 125 Ala Ala Leu Cys Pro Gly LeuLeu Leu Pro Ser Cys Ser Leu Lys Glu 130 135 140 Leu Ser Leu Ala Ser AsnGlu Leu Lys Asp Glu Gly Ala Arg Leu Leu 145 150 155 160 Cys Glu Ser LeuLeu Glu Pro Xaa Cys Ser Leu Leu Glu Leu Gln Met 165 170 175 Ser Ser AsnPro Leu Gly Asp Glu Gly Val Gln Glu Leu Cys Lys Ala 180 185 190 Leu SerGln Pro Asp Thr Ser Leu Arg Glu Leu Asp Leu Ser Asn Asn 195 200 205 CysMet Gly Gly Pro Gly Val Leu Gln Leu Leu Glu Ser Leu Lys Gln 210 215 220Pro Ser Cys Ser Leu Arg Ile Ile Ser 225 230 62 460 DNA ArtificialSequence Description of Artificial Sequence Synthetic NcoI-HindIIIinsert 62 tcc atg gac tac aag gat cat cac cat cac cat cac gga tcc ctggac 48 Met Asp Tyr Lys Asp His His His His His His Gly Ser Leu Asp 1 510 15 atc cag agc ctg gac atc cag tgt gag gag ctg agc gac gct aga tgg 96Ile Gln Ser Leu Asp Ile Gln Cys Glu Glu Leu Ser Asp Ala Arg Trp 20 25 30gcc gag ctc ctc cct ctg ctc cag cag ccc tac gcg cgc ctg gag nnn 144 AlaGlu Leu Leu Pro Leu Leu Gln Gln Pro Tyr Ala Arg Leu Glu Xaa 35 40 45 ctgnnn ctg nnn nnn nnn gac ctc acc gag gcc ggc ntg aag gac ctg 192 Leu XaaLeu Xaa Xaa Xaa Asp Leu Thr Glu Ala Gly Xaa Lys Asp Leu 50 55 60 gcc agcgtg ctc cgc tcc aac ccg agc ctg cgg gag ctg nnn ctg agc 240 Ala Ser ValLeu Arg Ser Asn Pro Ser Leu Arg Glu Leu Xaa Leu Ser 65 70 75 nnn aac aagctc ggc gat gca ggc gtg cgg ctg ctc ttg cag ggg ctg 288 Xaa Asn Lys LeuGly Asp Ala Gly Val Arg Leu Leu Leu Gln Gly Leu 80 85 90 95 ctg gac cccggc acg cgt ctc gag cag ctg gtc ctg tac gac att tac 336 Leu Asp Pro GlyThr Arg Leu Glu Gln Leu Val Leu Tyr Asp Ile Tyr 100 105 110 tgg tct gaggag atg gag gac cgg ctg cag gcc ctg gag aag gac aag 384 Trp Ser Glu GluMet Glu Asp Arg Leu Gln Ala Leu Glu Lys Asp Lys 115 120 125 cca tcc ctgagg gtc atc tcc ggt tcc gct ggc tcc gct gct ggt tct 432 Pro Ser Leu ArgVal Ile Ser Gly Ser Ala Gly Ser Ala Ala Gly Ser 130 135 140 ggc gag gctagc gaa ttc tgataagctt 460 Gly Glu Ala Ser Glu Phe 145 63 149 PRTArtificial Sequence Description of Artificial Sequence Translated aminoacid sequence of the synthetic NcoI- HindIII insert 63 Met Asp Tyr LysAsp His His His His His His Gly Ser Leu Asp Ile 1 5 10 15 Gln Ser LeuAsp Ile Gln Cys Glu Glu Leu Ser Asp Ala Arg Trp Ala 20 25 30 Glu Leu LeuPro Leu Leu Gln Gln Pro Tyr Ala Arg Leu Glu Xaa Leu 35 40 45 Xaa Leu XaaXaa Xaa Asp Leu Thr Glu Ala Gly Xaa Lys Asp Leu Ala 50 55 60 Ser Val LeuArg Ser Asn Pro Ser Leu Arg Glu Leu Xaa Leu Ser Xaa 65 70 75 80 Asn LysLeu Gly Asp Ala Gly Val Arg Leu Leu Leu Gln Gly Leu Leu 85 90 95 Asp ProGly Thr Arg Leu Glu Gln Leu Val Leu Tyr Asp Ile Tyr Trp 100 105 110 SerGlu Glu Met Glu Asp Arg Leu Gln Ala Leu Glu Lys Asp Lys Pro 115 120 125Ser Leu Arg Val Ile Ser Gly Ser Ala Gly Ser Ala Ala Gly Ser Gly 130 135140 Glu Ala Ser Glu Phe 145 64 927 DNA Artificial Sequence Descriptionof Artificial Sequence Synthetic NcoI-HindIII insert 64 cc atg gac tacaag gat cat cac cat cac cat cac gga tcc ctg gac 47 Met Asp Tyr Lys AspHis His His His His His Gly Ser Leu Asp 1 5 10 15 atc cag agc ctg gacatc cag tgt gag gag ctg agc gac gct aga tgg 95 Ile Gln Ser Leu Asp IleGln Cys Glu Glu Leu Ser Asp Ala Arg Trp 20 25 30 gcc gag ctc ctc cct ctgctc cag cag ccc tac gcg cgc ctg gag aag 143 Ala Glu Leu Leu Pro Leu LeuGln Gln Pro Tyr Ala Arg Leu Glu Lys 35 40 45 ctg gat ctg aat gat act gacctc acc gag gcc ggc gtg aag gac ctg 191 Leu Asp Leu Asn Asp Thr Asp LeuThr Glu Ala Gly Val Lys Asp Leu 50 55 60 gcc agc gtg ctc cgc tcc aac ccgagc ctg cgg gag ctg tct ctg agc 239 Ala Ser Val Leu Arg Ser Asn Pro SerLeu Arg Glu Leu Ser Leu Ser 65 70 75 act aac aag ctc ggc gat gca ggc gtgcgg ctg ctc ttg cag ggg ctg 287 Thr Asn Lys Leu Gly Asp Ala Gly Val ArgLeu Leu Leu Gln Gly Leu 80 85 90 95 ctg gac ccc ggc acg cgc ctg gag aagctg tat ctg gag cat aat gac 335 Leu Asp Pro Gly Thr Arg Leu Glu Lys LeuTyr Leu Glu His Asn Asp 100 105 110 ctc acc gag gcc ggc ctg aag gac ctggcc agc gtg ctc cgc tcc aac 383 Leu Thr Glu Ala Gly Leu Lys Asp Leu AlaSer Val Leu Arg Ser Asn 115 120 125 ccg agc ctg cgg gag ctg aat ctg agcgat aac aag ctc ggc gat gca 431 Pro Ser Leu Arg Glu Leu Asn Leu Ser AspAsn Lys Leu Gly Asp Ala 130 135 140 ggc gtg cgg ctg ctc ttg cag ggg ctgctg gac ccc ggc acg cgc ctg 479 Gly Val Arg Leu Leu Leu Gln Gly Leu LeuAsp Pro Gly Thr Arg Leu 145 150 155 gag gag ctg cag ctg cgt aat act gacctc acc gag gcc ggc gtg gag 527 Glu Glu Leu Gln Leu Arg Asn Thr Asp LeuThr Glu Ala Gly Val Glu 160 165 170 175 gac ctg gcc agc gtg ctc cgc tccaac ccg agc ctg cgg gag ctg tct 575 Asp Leu Ala Ser Val Leu Arg Ser AsnPro Ser Leu Arg Glu Leu Ser 180 185 190 ctg agc aat aac aag ctc ggc gatgca ggc gtg cgg ctg ctc ttg cag 623 Leu Ser Asn Asn Lys Leu Gly Asp AlaGly Val Arg Leu Leu Leu Gln 195 200 205 ggg ctg ctg gac ccc ggc acg cgcctg gag aag ctg tat ctg cgt aat 671 Gly Leu Leu Asp Pro Gly Thr Arg LeuGlu Lys Leu Tyr Leu Arg Asn 210 215 220 act gac ctc acc gag gcc ggc atgaag gac ctg gcc agc gtg ctc cgc 719 Thr Asp Leu Thr Glu Ala Gly Met LysAsp Leu Ala Ser Val Leu Arg 225 230 235 tcc aac ccg agc ctg cgg gag ctgtct ctg agc act aac aag ctc ggc 767 Ser Asn Pro Ser Leu Arg Glu Leu SerLeu Ser Thr Asn Lys Leu Gly 240 245 250 255 gat gca ggc gtg cgg ctg ctcttg cag ggg ctg ctg gac ctc ggc acg 815 Asp Ala Gly Val Arg Leu Leu LeuGln Gly Leu Leu Asp Leu Gly Thr 260 265 270 cgc ctc gag cag ctg gtc ctgtac gac att tac tgg tct gag gag atg 863 Arg Leu Glu Gln Leu Val Leu TyrAsp Ile Tyr Trp Ser Glu Glu Met 275 280 285 gag gac cgg ctg cag gcc ctggag aag gac aag cca tcc ctg agg gtc 911 Glu Asp Arg Leu Gln Ala Leu GluLys Asp Lys Pro Ser Leu Arg Val 290 295 300 atc tcc tgataagctt 927 IleSer 305 65 305 PRT Artificial Sequence Description of ArtificialSequence Translated amino acid sequence of the synthetic NcoI- HindIIIinsert 65 Met Asp Tyr Lys Asp His His His His His His Gly Ser Leu AspIle 1 5 10 15 Gln Ser Leu Asp Ile Gln Cys Glu Glu Leu Ser Asp Ala ArgTrp Ala 20 25 30 Glu Leu Leu Pro Leu Leu Gln Gln Pro Tyr Ala Arg Leu GluLys Leu 35 40 45 Asp Leu Asn Asp Thr Asp Leu Thr Glu Ala Gly Val Lys AspLeu Ala 50 55 60 Ser Val Leu Arg Ser Asn Pro Ser Leu Arg Glu Leu Ser LeuSer Thr 65 70 75 80 Asn Lys Leu Gly Asp Ala Gly Val Arg Leu Leu Leu GlnGly Leu Leu 85 90 95 Asp Pro Gly Thr Arg Leu Glu Lys Leu Tyr Leu Glu HisAsn Asp Leu 100 105 110 Thr Glu Ala Gly Leu Lys Asp Leu Ala Ser Val LeuArg Ser Asn Pro 115 120 125 Ser Leu Arg Glu Leu Asn Leu Ser Asp Asn LysLeu Gly Asp Ala Gly 130 135 140 Val Arg Leu Leu Leu Gln Gly Leu Leu AspPro Gly Thr Arg Leu Glu 145 150 155 160 Glu Leu Gln Leu Arg Asn Thr AspLeu Thr Glu Ala Gly Val Glu Asp 165 170 175 Leu Ala Ser Val Leu Arg SerAsn Pro Ser Leu Arg Glu Leu Ser Leu 180 185 190 Ser Asn Asn Lys Leu GlyAsp Ala Gly Val Arg Leu Leu Leu Gln Gly 195 200 205 Leu Leu Asp Pro GlyThr Arg Leu Glu Lys Leu Tyr Leu Arg Asn Thr 210 215 220 Asp Leu Thr GluAla Gly Met Lys Asp Leu Ala Ser Val Leu Arg Ser 225 230 235 240 Asn ProSer Leu Arg Glu Leu Ser Leu Ser Thr Asn Lys Leu Gly Asp 245 250 255 AlaGly Val Arg Leu Leu Leu Gln Gly Leu Leu Asp Leu Gly Thr Arg 260 265 270Leu Glu Gln Leu Val Leu Tyr Asp Ile Tyr Trp Ser Glu Glu Met Glu 275 280285 Asp Arg Leu Gln Ala Leu Glu Lys Asp Lys Pro Ser Leu Arg Val Ile 290295 300 Ser 305 66 12 DNA Artificial Sequence Description of ArtificialSequence Synthetic construct; type IIs restriction enzyme 66 gaagacnnnnnn 12 67 11 DNA Artificial Sequence Description of Artificial SequenceSynthetic construct; type IIs restriction enzyme 67 ggtctcnnnn n 11 6830 PRT Artificial Sequence Description of Artificial Sequence Syntheticpeptide construct 68 Xaa Gly Xaa Thr Pro Leu His Leu Ala Ala Xaa Xaa GlyHis Xaa Glu 1 5 10 15 Val Val Lys Leu Leu Leu Xaa Xaa Gly Ala Asp ValAsn Xaa 20 25 30 69 33 PRT Artificial Sequence Description of ArtificialSequence Synthetic peptide construct 69 Asp Xaa Xaa Gly Xaa Thr Pro LeuHis Leu Ala Ala Xaa Xaa Gly His 1 5 10 15 Leu Glu Val Val Lys Leu LeuLeu Glu Asn Gly Ala Asp Val Asn Ala 20 25 30 Xaa 70 33 PRT ArtificialSequence Description of Artificial Sequence Synthetic peptide construct70 Val Lys Leu Leu Leu Glu Ala Gly Ala Asp Val Asn Ala Arg Asp Ser 1 510 15 Asp Gly Asn Thr Pro Leu His Leu Ala Ala Glu Asn Gly Gln Leu Glu 2025 30 Val 71 33 PRT Artificial Sequence Description of ArtificialSequence Synthetic peptide construct 71 Asp Xaa Xaa Gly Xaa Thr Pro LeuHis Leu Ala Ala Xaa Xaa Gly His 1 5 10 15 Leu Glu Val Val Glu Val LeuLeu Lys His Gly Ala Asp Val Asn Ala 20 25 30 Xaa 72 33 PRT ArtificialSequence Description of Artificial Sequence Synthetic ankyrin repeatmotif sequence 72 Val Asn Ala Xaa Asp Xaa Xaa Gly Xaa Thr Pro Leu HisLeu Ala Ala 1 5 10 15 Xaa Xaa Gly His Leu Glu Ile Val Glu Val Leu LeuLys Xaa Gly Ala 20 25 30 Asp 73 166 PRT Artificial Sequence Descriptionof Artificial Sequence Synthetic E3-5 amino acid sequence 73 Met Arg GlySer His His His His His His Gly Ser Asp Leu Gly Lys 1 5 10 15 Lys LeuLeu Glu Ala Ala Arg Ala Gly Gln Asp Asp Glu Val Arg Ile 20 25 30 Leu MetAla Asn Gly Ala Asp Val Asn Ala Thr Asp Asn Asp Gly Tyr 35 40 45 Thr ProLeu His Leu Ala Ala Ser Asn Gly His Leu Glu Ile Val Glu 50 55 60 Val LeuLeu Lys Asn Gly Ala Asp Val Asn Ala Ser Asp Leu Thr Gly 65 70 75 80 IleThr Pro Leu His Leu Ala Ala Ala Thr Gly His Leu Glu Ile Val 85 90 95 GluVal Leu Leu Lys His Gly Ala Asp Val Asn Ala Tyr Asp Asn Asp 100 105 110Gly His Thr Pro Leu His Leu Ala Ala Lys Tyr Gly His Leu Glu Ile 115 120125 Val Glu Val Leu Leu Lys His Gly Ala Asp Val Asn Ala Gln Asp Lys 130135 140 Phe Gly Lys Thr Ala Phe Asp Ile Ser Ile Asp Asn Gly Asn Glu Asp145 150 155 160 Leu Ala Glu Ile Leu Gln 165 74 153 PRT Mus musculus 74Asp Leu Gly Lys Lys Leu Leu Glu Ala Ala Arg Ala Gly Gln Asp Asp 1 5 1015 Glu Val Arg Ile Leu Met Ala Asn Gly Ala Pro Phe Thr Thr Asp Trp 20 2530 Leu Gly Thr Ser Pro Leu His Leu Ala Ala Gln Tyr Gly His Phe Ser 35 4045 Thr Thr Glu Val Leu Leu Arg Ala Gly Val Ser Arg Asp Ala Arg Thr 50 5560 Lys Val Asp Arg Thr Pro Leu His Met Ala Ala Ser Glu Gly His Ala 65 7075 80 Asn Ile Val Glu Val Leu Leu Lys His Gly Ala Asp Val Asn Ala Lys 8590 95 Asp Met Leu Lys Met Thr Ala Leu His Trp Ala Thr Glu His Asn His100 105 110 Gln Glu Val Val Glu Leu Leu Ile Lys Tyr Gly Ala Asp Val HisThr 115 120 125 Gln Ser Lys Phe Cys Lys Thr Ala Phe Asp Ile Ser Ile AspAsn Gly 130 135 140 Asn Glu Asp Leu Ala Glu Ile Leu Gln 145 150 75 4 PRTArtificial Sequence Description of Artificial Sequence Synthetic peptidelinker 75 Pro Tyr Ala Arg 1 76 6 PRT Artificial Sequence Description ofArtificial Sequence Synthetic 6X-His tag 76 His His His His His His 1 577 5 PRT Artificial Sequence Description of Artificial SequenceSynthetic flag-tag sequence 77 Met Asp Tyr Lys Asp 1 5

1. A collection of nucleic acid molecules encoding a collection ofrepeat proteins, each repeat protein comprising a repeat domain, whichcomprises a set of consecutive repeat modules, wherein each of saidrepeat modules is derived from one or more repeat units of one family ofnaturally occurring repeat proteins, wherein said repeat units compriseframework residues and target interaction residues, wherein said repeatproteins differ in at least one position.
 2. The collection of claim 1,wherein each of said repeat modules has an amino acid sequence, whereinat least 70% of the amino acid residues correspond either (i) toconsensus amino acid residues deduced from the amino acid residues foundat the corresponding positions of at least two naturally occurringrepeat units; or (ii) to the amino acid residues found at thecorresponding positions in a naturally occurring repeat unit.
 3. Thecollection of claim 1 or 2, wherein said set consists of between two andabout 30 repeat modules.
 4. The collection of any one of claims 1 to 3,wherein said repeat modules are directly connected.
 5. The collection ofany one of claims 1 to 3, wherein said repeat modules are connected by a(poly)peptide linker.
 6. The collection of any one of claims 1 to 5,wherein said repeat domain further comprises an N- and/or a C-terminalcapping module having an amino acid sequence different from any one ofsaid repeat modules.
 7. The collection of any one of claims 1 to 6,wherein said repeat units are ankyrin repeats.
 8. The collection ofclaim 7, wherein each of said repeat modules comprises the ankyrinrepeat sequence motif DxxGxTPLHLAaxx±±±±±±±±±±GpxpaVpxLLpxGA±±±±±DVNAx,wherein “x” denotes any amino acid, “±” denotes any amino acid or adeletion, “a” denotes an amino acid with an apolar side chain, and “p”denotes a residue with a polar sidechain.
 9. The collection of claim 7,wherein each of said repeat modules comprises the ankyrin repeatsequence motif DxxGxTPLHLAxxxGxxxVVxLLLxxGADVNAx, wherein “x” denotesany amino acid.
 10. The diverse collection of claim 7, wherein each ofsaid repeat modules comprises the ankyrin repeat sequence motifDxxGxTPLHLAxxxGxxxIVxVLLxxGADVNAx, wherein “x” denotes any amino acid.11. The collection of any one of claims 8 to 10, wherein one or more ofthe positions denoted “x” are randomised.
 12. The diverse collection ofclaim 7, wherein each of said repeat modules comprises the ankyrinrepeat sequence motif D11G1TPLHLAA11GHLEIVEVLLK2GADVNA1, wherein 1represents an amino acid residue selected from the group: A, D, E, F, H,I, K, L, M, N, Q, R, S, T, V, W and Y; wherein 2 represents an aminoacid residue selected from the group: H, N and Y.
 13. The collection ofany one of claims 1 to 6, wherein said repeat units are leucine-richrepeats (LRR).
 14. The collection of claim 13, wherein each of saidmodules comprises the LRR sequence motif xLxxLxLxxN±xaxx±a±±±±a±±a±±x±±,wherein “x” denotes any amino acid, “a” denotes an aliphatic amino acid,and “±” denotes any amino acid or a deletion.
 15. The collection ofclaim 13, wherein at least one of said modules comprises the LRRsequence motif xLExLxLxxCxLTxxxCxxLxxaLxxxx, wherein “x” denotes anyamino acid, and “a” denotes an aliphatic amino acid (A-type LRR). 16.The collection of claim 13, wherein at least one of said modulescomprises the LRR sequence motif xLxELxLxxNxLGDxGaxxLxxxLxxPxx, wherein“x” denotes any amino acid, and “a” denotes an aliphatic amino acid(B-type LRR).
 17. The collection of any one of claims 14 to 16, whereinone or more of the positions denoted “x” and/or “±” are randomised. 18.The collection of claim 15, wherein the cysteine residue at position 10in the A-type LRR consensus sequence is replaced by a hydrophilic aminoacid residue, and wherein the cystein residue at position 17 is replacedby a hydrophobic amino acid residue.
 19. The collection of any one ofclaims 8 to 12 or 14 to 18, wherein one or more of the amino acidresidues in said consensus sequences are exchanged by an amino acidresidue found at the corresponding position in a corresponding naturallyoccurring repeat unit.
 20. The collection of any one of claims 1 to 19,wherein said set consists of one type of repeat modules.
 21. Thecollection of any one of claims 1 to 19, wherein said set consists oftwo different types of repeat modules.
 22. The collection of claim 20,wherein said set comprises two different types of consecutive repeatmodules as pairs in said repeat domain.
 23. The collection of claim 21or 22, wherein said two different types of modules are based on saidA-type LRR and B-type LRR.
 24. The collection of any one of claims 20 to23, wherein the amino acid sequences of the repeat modules comprised insaid set are identical for each said type except for the randomisedresidues.
 25. The collection of claim 24, wherein the nucleic acidsequences encoding the copies of each said type are identical except forthe codons encoding amino acid residues at positions being randomised.26. The collection of any one of claims 1 to 25, wherein said nucleicacid molecules comprise identical nucleic acid sequences of at least 9nucleotides between said repeat modules.
 27. The collection of claims 22or 23, wherein said nucleic acid molecules comprise identical nucleicacid sequences of at least 9 nucleotides between said pairs.
 28. Thecollection of any one of claims 1 to 26, wherein each of the nucleicacid sequences between said modules, or said pairs, comprises arestriction enzyme recognition sequence.
 29. The collection of any oneof claims 1 to 27, wherein each of the nucleic acid sequences betweensaid modules, or said pairs, comprises a nucleic acid sequence formedfrom cohesive ends created by two compatible restriction enzymes. 30.The collection of claims 26 or 27, wherein said identical nucleic acidsequences allow a PCR-based assembly of said nucleic acid molecules. 31.The collection of claim 24, wherein said repeat domain comprises one ormore pairs of modules based on said A-type LRR and B-type LRR, whereineach of said pairs has the sequenceRLE1L1L12DLTEAG4KDLASVLRSNPSLREL3LS3NKLGDAGVRLLLQGL LDPGT, wherein 1represents an amino acid residue selected from the group: D, E, N, Q, S,R, K, W and Y; wherein 2 represents an amino acid residue selected fromthe group: N, S and T; wherein 3 represents an amino acid residueselected from the group: G, S, D, N, H and T; and wherein 4 representsan amino acid residue selected from the group: L, V and M.
 32. Thecollection of claim 31, wherein each of said pairs of modules is encodedby the nucleic acid molecule CGC CTG GAG 111 CTG 111 CTG 111 111 222 GACCTC ACC GAG GCC GGC 444 AAG GAC CTG GCC AGC GTG CTC CGC TCC AAC CCG AGCCTG CGG GAG CTG 333 CTG AGC 333 AAC AAG CTC GGC GAT GCA GGC GTG CGG CTGCTC TTG CAG GGG CTG CTG GAC CCC GGC ACG

wherein 111 represents a codon encoding an amino acid residue selectedfrom the group: D, E, N, Q, S, R, K, W and Y; wherein 222 represents acodon encoding an amino acid residue selected from the group: N, S andT; wherein 333 represents a codon encoding an amino acid residueselected from the group: G, S, D, N, H and T; and wherein 444 representsa codon encoding an amino acid residue selected from the group: L, V andM.
 33. The collection of claim 31, wherein one or more of the amino acidresidues in at least one of said pair of modules are exchanged by anamino acid residue found at the corresponding position in a naturallyoccurring LRR.
 34. The collection of claim 32, wherein one or more ofthe amino acid codons in at least one of said pairs of modules areexchanged by a codon encoding an amino acid residue found at thecorresponding position in a naturally occurring LRR.
 35. A collection ofrecombinant nucleic acid molecules comprising a collection of nucleicacid molecules according to any one of claims 1 to claim
 34. 36. Acollection of vectors comprising a collection of nucleic acid moleculesaccording to any one of claims 1 to claim 34, or a collection ofrecombinant nucleic acid molecules according to claim
 35. 37. Acollection of host cells comprising a collection of nucleic acidmolecules according to any one of claims 1 to 34, a collection ofrecombinant nucleic acid molecules according to claim 35, or acollection of vectors according to claim
 36. 38. A collection of repeatproteins encoded by a collection of nucleic acid molecules according toany one of claims 1 to 34, by a collection of recombinant nucleic acidmolecules according to claim 35, by a collection of vectors according toclaim 36, or produced by a collection of host cells according to claim37.
 39. A method for the construction of a collection of nucleic acidmolecules according to any one of claims 1 to 34, comprising the stepsof (a) identifying a repeat unit from a repeat protein family; (b)identifying framework residues and target interaction residues in saidrepeat unit; (c) deducing at least one type of repeat module comprisingframework residues and randomised target interaction residues from atleast one member of said repeat protein family; and (d) constructingnucleic acid molecules each encoding a repeat protein comprising two ormore copies of said at least one type of repeat module deduced in step(c).
 40. The method of claim 38, wherein said at least one repeat modulededuced in step (c) has an amino acid sequence, wherein at least 70% ofthe amino acid residues correspond either (i) to consensus amino acidresidues deduced from the amino acid residues found at the correspondingpositions of at least two naturally occurring repeat units; or (ii) tothe amino acid residues found at the corresponding positions in anaturally occurring repeat unit.
 41. A method for the production of acollection of repeat proteins according to claim 38, comprising thesteps of (a) providing a collection of host cells according to claim 37;and (b) expressing the collection of nucleic acid molecules comprised insaid host cells.
 42. A method for obtaining a repeat protein having apredetermined property, comprising the steps of (a) providing acollection of repeat proteins according to claim 38 or 39 or producedaccording to claim 41; and (b) screening said collection and/orselecting from said collection to obtain at least one repeat proteinhaving said predetermined property.
 43. The method of claim 42, whereinsaid predetermined property is binding to a target.
 44. A repeat proteinfrom a collection according to any one of claims 24 to
 34. 45. A nucleicacid molecule encoding the repeat protein of claim
 44. 46. A vectorcontaining the nucleic acid molecule of claim
 45. 47. A pharmaceuticalcomposition comprising the repeat protein of claim 44 or the nucleicacid molecule of claim 45, and optionally a pharmaceutically acceptablecarrier and/or diluent.
 48. A nucleic acid molecule encoding a pair ofrepeat modules for the construction of a collection according to claims31 or 32, wherein said nucleic acid molecule is: CGC CTG GAG 111 CTG 111CTG 111 111 222 GAC CTC ACC GAG GCC GGC 444 AAG GAC CTG GCC AGC GTG CTCCGC TCC AAC CCG AGC CTG CGG GAG CTG 333 CTG AGC 333 AAC AAG CTC GGC GATGCA GGC GTG CGG CTG CTC TTG CAG GGG CTG CTG GAC CCC GGC ACG,

wherein 111 represents a codon encoding an amino acid residue selectedfrom the group: D, E, N, Q, S, R, K, W and Y; wherein 222 0 represents acodon encoding an amino acid residue selected from the group: N, S andT; wherein 333 represents a codon encoding an amino acid residueselected from the group: G, S, D, N, H and T; and wherein 444 representsa codon encoding an amino acid residue selected from the group: L, V andM.